* Re: [PATCH] doc: announce cryptodev change to support EDDSA
2024-07-25 15:01 0% ` Kusztal, ArkadiuszX
@ 2024-07-31 12:57 3% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 12:57 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, Richardson, Bruce, ciara.power,
jerinj, fanzhang.oss, Ji, Kai, jack.bond-preston, Marchand,
David, hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona,
Doherty, Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
jianjay.zhou, Daly, Lee
Cc: Anoob Joseph, zhangfei.gao, Kusztal, ArkadiuszX
25/07/2024 17:01, Kusztal, ArkadiuszX:
> > Announce the additions in cryptodev ABI to support EDDSA algorithm.
> >
> > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
>
> Acked-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Applied, thanks.
It means we are not able to add an algo without breaking ABI.
Is it something we can improve?
^ permalink raw reply [relevance 3%]
* Re: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-30 14:39 0% ` Gowrishankar Muthukrishnan
@ 2024-07-31 12:51 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 12:51 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan
Cc: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
jack.bond-preston, Marchand, David, hemant.agrawal,
De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, Daly, Lee
30/07/2024 16:39, Gowrishankar Muthukrishnan:
> Hi,
> We need to fix padding info in DPDK as per VirtIO specification in order to support RSA in virtio devices. VirtIO-crypto specification and DPDK specification differs in the way padding is handled.
> With current DPDK & virtio specification, it is impossible to support RSA in virtio-crypto. If you think DPDK spec should not be modified, we will try to amend the virtIO spec to match DPDK, but since we do not know if the virtIO community would accept, can we merge the deprecation notice?
There is a long list of Cc but I see no support outside of Marvell.
> >>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> >>> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> >>> + breaking ABI. The new location is recommended to comply with
> >>> + virtio-crypto specification. Applications and drivers using
> >>> + this struct will be updated.
> >>> +
>
>
> >> The problem here, I see is that there is one private key but multiple combinations of padding.
> >> Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
> >> The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
> >
> > Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
> > When we share the same private key for Sign (and its public key in case of Encryption) between
> > multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
> > could potentially open door to used private key in the session and hence take advantage
> > on other crypto operations.
> >
> > I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
> > Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
> > only that session could be destroyed.
>
>
> >>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> >>> + in either exponent or quintuple format is changed from union to
> >>> +struct
> >>> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> >>> + This change will not break existing applications.
> > >
> > > This one I agree. RFC 8017 obsoletes RFC 3447.
^ permalink raw reply [relevance 0%]
* Re: [PATCH] doc: announce dmadev new capability addition
2024-07-29 15:20 3% ` Jerin Jacob
2024-07-29 17:17 0% ` Morten Brørup
@ 2024-07-31 10:24 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 10:24 UTC (permalink / raw)
To: Vamsi Attunuru, Morten Brørup, dev
Cc: fengchengwen, kevin.laatz, bruce.richardson, jerinj, anoobj, Jerin Jacob
29/07/2024 17:20, Jerin Jacob:
> On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com> wrote:
> >
> > Announce addition of new capability flag and fields in
>
> The new capability flag won't break ABI. We can mention only fields
> update rte_dma_info and rte_dma_conf structures.
>
> Another option is new set APIs for priority enablement. The downside
> is more code. All, opinions?
>
> > rte_dma_info and rte_dma_conf structures.
I'm fine with just updating these structs.
> > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Any other opinions?
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-25 15:53 0% ` Gowrishankar Muthukrishnan
@ 2024-07-30 14:39 0% ` Gowrishankar Muthukrishnan
2024-07-31 12:51 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-30 14:39 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, Kusztal, ArkadiuszX, dev,
Anoob Joseph, Richardson, Bruce, ciara.power, Jerin Jacob,
fanzhang.oss, Ji, Kai, jack.bond-preston, Marchand, David,
hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona, Doherty,
Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
ajit.khaparde, raveendra.padasalagi, vikas.gupta, zhangfei.gao,
g.singh, jianjay.zhou, Daly, Lee
[-- Attachment #1: Type: text/plain, Size: 2294 bytes --]
Hi,
We need to fix padding info in DPDK as per VirtIO specification in order to support RSA in virtio devices. VirtIO-crypto specification and DPDK specification differs in the way padding is handled.
With current DPDK & virtio specification, it is impossible to support RSA in virtio-crypto. If you think DPDK spec should not be modified, we will try to amend the virtIO spec to match DPDK, but since we do not know if the virtIO community would accept, can we merge the deprecation notice?
Thanks,
Gowrishankar
ZjQcmQRYFpfptBannerEnd
>>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
>>> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
>>> + breaking ABI. The new location is recommended to comply with
>>> + virtio-crypto specification. Applications and drivers using
>>> + this struct will be updated.
>>> +
>> The problem here, I see is that there is one private key but multiple combinations of padding.
>> Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
>> The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
> Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
> When we share the same private key for Sign (and its public key in case of Encryption) between
> multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
> could potentially open door to used private key in the session and hence take advantage
> on other crypto operations.
> I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
> Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
> only that session could be destroyed.
>>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
>>> + in either exponent or quintuple format is changed from union to
>>> +struct
>>> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
>>> + This change will not break existing applications.
>>This one I agree. RFC 8017 obsoletes RFC 3447.
> Thanks,
> Gowrishankar
[-- Attachment #2: Type: text/html, Size: 8293 bytes --]
^ permalink raw reply [relevance 0%]
* Re: [PATCH v8 5/5] dts: add API doc generation
2024-07-12 8:57 3% ` [PATCH v8 5/5] dts: add API doc generation Juraj Linkeš
@ 2024-07-30 13:51 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-30 13:51 UTC (permalink / raw)
To: Juraj Linkeš
Cc: Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte, dev, Luca Vizzarro
12/07/2024 10:57, Juraj Linkeš:
> The tool used to generate DTS API docs is Sphinx, which is already in
> use in DPDK. The same configuration is used to preserve style with one
> DTS-specific configuration (so that the DPDK docs are unchanged) that
> modifies how the sidebar displays the content.
What is changed in the sidebar?
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -244,3 +244,6 @@ The public API headers are grouped by topics:
> [experimental APIs](@ref rte_compat.h),
> [ABI versioning](@ref rte_function_versioning.h),
> [version](@ref rte_version.h)
> +
> +- **tests**:
> + [**DTS**](@dts_api_main_page)
OK looks good
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -124,6 +124,8 @@ SEARCHENGINE = YES
> SORT_MEMBER_DOCS = NO
> SOURCE_BROWSER = YES
>
> +ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
Why is it needed?
That's the only way to reference it in doxy-api-index.md?
Would be nice to explain in the commit log.
> --- a/doc/api/meson.build
> +++ b/doc/api/meson.build
> +# A local reference must be relative to the main index.html page
> +# The path below can't be taken from the DTS meson file as that would
> +# require recursive subdir traversal (doc, dts, then doc again)
This comment is really obscure.
> +cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
Oh I think I get it:
- DTS_API_MAIN_PAGE is the Meson variable
- dts_api_main_page is the Doxygen variable
> +# Napoleon enables the Google format of Python doscstrings, used in DTS
> +# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
Close sentences with a dot, it is easier to read.
> +extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
> +
> +# DTS Python docstring options
> +autodoc_default_options = {
> + 'members': True,
> + 'member-order': 'bysource',
> + 'show-inheritance': True,
> +}
> +autodoc_class_signature = 'separated'
> +autodoc_typehints = 'both'
> +autodoc_typehints_format = 'short'
> +autodoc_typehints_description_target = 'documented'
> +napoleon_numpy_docstring = False
> +napoleon_attr_annotations = True
> +napoleon_preprocess_types = True
> +add_module_names = False
> +toc_object_entries = True
> +toc_object_entries_show_parents = 'hide'
> +intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
> +
> +dts_root = environ.get('DTS_ROOT')
Why does it need to be passed as an environment variable?
Isn't it a fixed absolute path?
> +if dts_root:
> + path.append(dts_root)
> + # DTS Sidebar config
> + html_theme_options = {
> + 'collapse_navigation': False,
> + 'navigation_depth': -1,
> + }
[...]
> +To build DTS API docs, install the dependencies with Poetry, then enter its shell:
I don't plan to use Poetry on my machine.
Can we simply describe the dependencies even if the versions are not specified?
> +
> +.. code-block:: console
> +
> + poetry install --no-root --with docs
> + poetry shell
> +
> +The documentation is built using the standard DPDK build system.
> +After executing the meson command and entering Poetry's shell, build the documentation with:
> +
> +.. code-block:: console
> +
> + ninja -C build dts-doc
Don't we rely on the Meson option "enable_docs"?
> +
> +The output is generated in ``build/doc/api/dts/html``.
> +
> +.. Note::
In general the RST expressions are lowercase.
> +
> + Make sure to fix any Sphinx warnings when adding or updating docstrings,
> + and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
It looks like something to write in the contributing guide.
> +++ b/dts/doc/meson.build
> @@ -0,0 +1,27 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023 PANTHEON.tech s.r.o.
> +
> +sphinx = find_program('sphinx-build', required: false)
> +sphinx_apidoc = find_program('sphinx-apidoc', required: false)
> +
> +if not sphinx.found() or not sphinx_apidoc.found()
You should include the option "enable_docs" here.
> + subdir_done()
> +endif
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce dmadev new capability addition
2024-07-29 15:20 3% ` Jerin Jacob
@ 2024-07-29 17:17 0% ` Morten Brørup
2024-07-31 10:24 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-07-29 17:17 UTC (permalink / raw)
To: Jerin Jacob, Vamsi Attunuru
Cc: fengchengwen, dev, kevin.laatz, bruce.richardson, jerinj, anoobj
> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> Sent: Monday, 29 July 2024 17.20
>
> On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com>
> wrote:
> >
> > Announce addition of new capability flag and fields in
> > rte_dma_info and rte_dma_conf structures.
>
> The new capability flag won't break ABI. We can mention only fields
> update rte_dma_info and rte_dma_conf structures.
>
> Another option is new set APIs for priority enablement. The downside
> is more code. All, opinions?
I think that this feature should be simple enough to expand the rte_dma_info and rte_dma_conf structures with a few new fields, rather than adding a new set of APIs for it.
It seems to become 1-level weighted priority scheduling of a few QoS classes, not hierarchical or anything complex enough to justify a new set of APIs. Just a simple array of per-class properties.
The max possible number of QoS classes (i.e. the array size) should be build time configurable. Considering Marvell hardware, it seems 4 would be a good default.
^ permalink raw reply [relevance 0%]
* Re: [PATCH] doc: announce dmadev new capability addition
@ 2024-07-29 15:20 3% ` Jerin Jacob
2024-07-29 17:17 0% ` Morten Brørup
2024-07-31 10:24 0% ` Thomas Monjalon
0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2024-07-29 15:20 UTC (permalink / raw)
To: Vamsi Attunuru, Morten Brørup
Cc: fengchengwen, dev, kevin.laatz, bruce.richardson, jerinj, anoobj
On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com> wrote:
>
> Announce addition of new capability flag and fields in
The new capability flag won't break ABI. We can mention only fields
update rte_dma_info and rte_dma_conf structures.
Another option is new set APIs for priority enablement. The downside
is more code. All, opinions?
> rte_dma_info and rte_dma_conf structures.
>
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> ---
> RFC:
> https://patchwork.dpdk.org/project/dpdk/patch/20240729115558.263574-1-vattunuru@marvell.com/
>
> doc/guides/rel_notes/deprecation.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..05d28473c0 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,8 @@ Deprecation Notices
> will be deprecated and subsequently removed in DPDK 24.11 release.
> Before this, the new port library API (functions rte_swx_port_*)
> will gradually transition from experimental to stable status.
> +
> +* dmadev: A new flag ``RTE_DMA_CAPA_QOS`` will be introduced to advertise
> + dma device's QoS capability. Also new fields will be added in ``rte_dma_info``
> + and ``rte_dma_conf`` structures to get device supported priority levels
> + and to configure the required priority level.
> --
> 2.25.1
>
^ permalink raw reply [relevance 3%]
* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
2024-07-25 15:53 0% ` Gowrishankar Muthukrishnan
@ 2024-07-25 16:00 0% ` Gowrishankar Muthukrishnan
1 sibling, 0 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25 16:00 UTC (permalink / raw)
To: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
jack.bond-preston, Marchand, David, hemant.agrawal,
De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, Daly, Lee
[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]
Hi ArkadiuszX,
> +
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> + breaking ABI. The new location is recommended to comply with
> + virtio-crypto specification. Applications and drivers using
> + this struct will be updated.
> +
The problem here, I see is that there is one private key but multiple combinations of padding.
Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
When we share the same private key for Sign (and its public key in case of Encryption) between
multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
could potentially open door to used private key in the session and hence take advantage
on other crypto operations.
I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
only that session could be destroyed.
Please share your thoughts.
> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> + in either exponent or quintuple format is changed from union to
> +struct
> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> + This change will not break existing applications.
This one I agree. RFC 8017 obsoletes RFC 3447.
Thanks,
Gowrishankar
> --
> 2.21.0
[-- Attachment #2: Type: text/html, Size: 7769 bytes --]
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
@ 2024-07-25 15:53 0% ` Gowrishankar Muthukrishnan
2024-07-30 14:39 0% ` Gowrishankar Muthukrishnan
2024-07-25 16:00 0% ` Gowrishankar Muthukrishnan
1 sibling, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25 15:53 UTC (permalink / raw)
To: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
jack.bond-preston, Marchand, David, hemant.agrawal,
De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, Daly, Lee
[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> + breaking ABI. The new location is recommended to comply with
> + virtio-crypto specification. Applications and drivers using
> + this struct will be updated.
> +
The problem here, I see is that there is one private key but multiple combinations of padding.
Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
When we share the same private key for Sign (and its public key in case of Encryption) between
multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
could potentially open door to used private key in the session and hence take advantage
on other crypto operations.
I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
only that session could be destroyed.
Thanks,
Gowrishankar
Though padding schemes could be same
> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> + in either exponent or quintuple format is changed from union to
> +struct
> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> + This change will not break existing applications.
This one I agree. RFC 8017 obsoletes RFC 3447.
Thanks,
Gowrishankar
> --
> 2.21.0
[-- Attachment #2: Type: text/html, Size: 7504 bytes --]
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce cryptodev change to support EDDSA
2024-07-22 14:53 8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
2024-07-24 5:07 0% ` Anoob Joseph
2024-07-24 6:46 0% ` [EXTERNAL] " Akhil Goyal
@ 2024-07-25 15:01 0% ` Kusztal, ArkadiuszX
2024-07-31 12:57 3% ` Thomas Monjalon
2 siblings, 1 reply; 200+ results
From: Kusztal, ArkadiuszX @ 2024-07-25 15:01 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, Richardson, Bruce, ciara.power,
jerinj, fanzhang.oss, Ji, Kai, jack.bond-preston, Marchand,
David, hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona,
Doherty, Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
jianjay.zhou, Daly, Lee
Cc: Anoob Joseph, zhangfei.gao
> Announce the additions in cryptodev ABI to support EDDSA algorithm.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-22 14:55 5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
2024-07-24 6:49 0% ` [EXTERNAL] " Akhil Goyal
@ 2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
2024-07-25 15:53 0% ` Gowrishankar Muthukrishnan
2024-07-25 16:00 0% ` Gowrishankar Muthukrishnan
1 sibling, 2 replies; 200+ results
From: Kusztal, ArkadiuszX @ 2024-07-25 9:48 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, Anoob Joseph, Richardson, Bruce,
ciara.power, jerinj, fanzhang.oss, Ji, Kai, jack.bond-preston,
Marchand, David, hemant.agrawal, De Lara Guarch, Pablo, Trahe,
Fiona, Doherty, Declan, matan, ruifeng.wang, Gujjar,
Abhinandan S, maxime.coquelin, chenbox, sunilprakashrao.uttarwar,
andrew.boyer, ajit.khaparde, raveendra.padasalagi, vikas.gupta,
zhangfei.gao, g.singh, jianjay.zhou, Daly, Lee
Hi Gowrishankar,
> -----Original Message-----
> From: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> Sent: Monday, July 22, 2024 4:56 PM
> To: dev@dpdk.org; Anoob Joseph <anoobj@marvell.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; ciara.power@intel.com; jerinj@marvell.com;
> fanzhang.oss@gmail.com; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>;
> Ji, Kai <kai.ji@intel.com>; jack.bond-preston@foss.arm.com; Marchand, David
> <david.marchand@redhat.com>; hemant.agrawal@nxp.com; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Doherty, Declan <declan.doherty@intel.com>;
> matan@nvidia.com; ruifeng.wang@arm.com; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; maxime.coquelin@redhat.com;
> chenbox@nvidia.com; sunilprakashrao.uttarwar@amd.com;
> andrew.boyer@amd.com; ajit.khaparde@broadcom.com;
> raveendra.padasalagi@broadcom.com; vikas.gupta@broadcom.com;
> zhangfei.gao@linaro.org; g.singh@nxp.com; jianjay.zhou@huawei.com; Daly,
> Lee <lee.daly@intel.com>
> Cc: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> Subject: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
>
> Announce cryptodev changes to offload RSA asymmetric operation in VirtIO
> PMD.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
> https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-
> gmuthukrishn@marvell.com/
> https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-
> gmuthukrishn@marvell.com/
> ---
> doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..26fec84aba 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,14 @@ Deprecation Notices
> will be deprecated and subsequently removed in DPDK 24.11 release.
> Before this, the new port library API (functions rte_swx_port_*)
> will gradually transition from experimental to stable status.
> +
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> + breaking ABI. The new location is recommended to comply with
> + virtio-crypto specification. Applications and drivers using
> + this struct will be updated.
> +
The problem here, I see is that there is one private key but multiple combinations of padding.
Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> + in either exponent or quintuple format is changed from union to
> +struct
> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> + This change will not break existing applications.
This one I agree. RFC 8017 obsoletes RFC 3447.
> --
> 2.21.0
^ permalink raw reply [relevance 0%]
* RE: [EXTERNAL] Re: [PATCH] doc: announce vhost changes to support asymmetric operation
2024-07-23 18:30 4% ` Jerin Jacob
@ 2024-07-25 9:29 4% ` Gowrishankar Muthukrishnan
0 siblings, 0 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25 9:29 UTC (permalink / raw)
To: Jerin Jacob
Cc: dev, Anoob Joseph, bruce.richardson, ciara.power, Jerin Jacob,
fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, ruifeng.wang,
abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, lee.daly
[-- Attachment #1: Type: text/plain, Size: 255 bytes --]
Sure Jerin. I’ll drop this proposal as ABI versioning could help. Thanks.
Looks like in this case adding new arguments to function. Could you
check ABI versing helps here? It seems like it can be easy manged with
ABI versioning.
[-- Attachment #2: Type: text/html, Size: 2978 bytes --]
^ permalink raw reply [relevance 4%]
* [PATCH v5 5/6] ci: test compiler memcpy
@ 2024-07-24 7:53 5% ` Mattias Rönnblom
0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-07-24 7:53 UTC (permalink / raw)
To: dev
Cc: Mattias Rönnblom, Morten Brørup, Stephen Hemminger,
David Marchand, Pavan Nikhilesh, Bruce Richardson,
Mattias Rönnblom
Add compilation tests for the use_cc_memcpy build option.
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
.ci/linux-build.sh | 5 +++++
.github/workflows/build.yml | 7 +++++++
devtools/test-meson-builds.sh | 4 +++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 15ed51e4c1..a873f83d09 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -98,6 +98,11 @@ if [ "$STDATOMIC" = "true" ]; then
else
OPTS="$OPTS -Dcheck_includes=true"
fi
+if [ "$CCMEMCPY" = "true" ]; then
+ OPTS="$OPTS -Duse_cc_memcpy=true"
+else
+ OPTS="$OPTS -Duse_cc_memcpy=true"
+fi
if [ "$MINI" = "true" ]; then
OPTS="$OPTS -Denable_drivers=net/null"
OPTS="$OPTS -Ddisable_libs=*"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index dbf25626d4..cd45d6c6c1 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -31,6 +31,7 @@ jobs:
RISCV64: ${{ matrix.config.cross == 'riscv64' }}
RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
STDATOMIC: ${{ contains(matrix.config.checks, 'stdatomic') }}
+ CCMEMCPY: ${{ contains(matrix.config.checks, 'ccmemcpy') }}
strategy:
fail-fast: false
@@ -45,6 +46,12 @@ jobs:
- os: ubuntu-22.04
compiler: clang
checks: stdatomic
+ - os: ubuntu-22.04
+ compiler: gcc
+ checks: ccmemcpy
+ - os: ubuntu-22.04
+ compiler: clang
+ checks: ccmemcpy
- os: ubuntu-22.04
compiler: gcc
checks: abi+debug+doc+examples+tests
diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index d71bb1ded0..e72146be3b 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -228,12 +228,14 @@ for c in gcc clang ; do
if [ $s = shared ] ; then
abicheck=ABI
stdatomic=-Denable_stdatomic=true
+ ccmemcpy=-Duse_cc_memcpy=true
else
abicheck=skipABI # save time and disk space
stdatomic=-Denable_stdatomic=false
+ ccmemcpy=-Duse_cc_memcpy=false
fi
export CC="$CCACHE $c"
- build build-$c-$s $c $abicheck $stdatomic --default-library=$s
+ build build-$c-$s $c $abicheck $stdatomic $ccmemcpy --default-library=$s
unset CC
done
done
--
2.34.1
^ permalink raw reply [relevance 5%]
* RE: [EXTERNAL] [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
2024-07-22 14:55 5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
@ 2024-07-24 6:49 0% ` Akhil Goyal
2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
1 sibling, 0 replies; 200+ results
From: Akhil Goyal @ 2024-07-24 6:49 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, Anoob Joseph, bruce.richardson,
ciara.power, Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal,
kai.ji, jack.bond-preston, david.marchand, hemant.agrawal,
pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, lee.daly
Cc: Gowrishankar Muthukrishnan
> Announce cryptodev changes to offload RSA asymmetric operation in
> VirtIO PMD.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
> https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-gmuthukrishn@marvell.com/
> https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-gmuthukrishn@marvell.com/
> ---
Acked-by: Akhil Goyal <gakhil@marvell.com>
> doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..26fec84aba 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,14 @@ Deprecation Notices
> will be deprecated and subsequently removed in DPDK 24.11 release.
> Before this, the new port library API (functions rte_swx_port_*)
> will gradually transition from experimental to stable status.
> +
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> + rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> + breaking ABI. The new location is recommended to comply with
> + virtio-crypto specification. Applications and drivers using
> + this struct will be updated.
> +
> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> + in either exponent or quintuple format is changed from union to struct
> + data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> + This change will not break existing applications.
> --
> 2.21.0
^ permalink raw reply [relevance 0%]
* RE: [EXTERNAL] [PATCH] doc: announce cryptodev change to support EDDSA
2024-07-22 14:53 8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
2024-07-24 5:07 0% ` Anoob Joseph
@ 2024-07-24 6:46 0% ` Akhil Goyal
2024-07-25 15:01 0% ` Kusztal, ArkadiuszX
2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-07-24 6:46 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, bruce.richardson, ciara.power,
Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal, kai.ji,
jack.bond-preston, david.marchand, hemant.agrawal,
pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, g.singh, jianjay.zhou,
lee.daly
Cc: Anoob Joseph, zhangfei.gao, Gowrishankar Muthukrishnan
> Announce the additions in cryptodev ABI to support EDDSA algorithm.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
Acked-by: Akhil Goyal <gakhil@marvell.com>
> RFC:
> https://patches.dpdk.org/project/dpdk/patch/0ae6a1afadac64050d80b0fd7712c4a6a8599e2c.1701273963.git.gmuthukrishn@marvell.com/
> ---
> doc/guides/rel_notes/deprecation.rst | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..fcbec965b1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,7 @@ Deprecation Notices
> will be deprecated and subsequently removed in DPDK 24.11 release.
> Before this, the new port library API (functions rte_swx_port_*)
> will gradually transition from experimental to stable status.
> +
> +* cryptodev: The enum ``rte_crypto_asym_xform_type`` and struct
> ``rte_crypto_asym_op``
> + will be extended to include new values to support EDDSA. This will break
> + ABI compatibility with existing applications that use these data types.
> --
> 2.21.0
^ permalink raw reply [relevance 0%]
* RE: [PATCH] doc: announce cryptodev change to support EDDSA
2024-07-22 14:53 8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
@ 2024-07-24 5:07 0% ` Anoob Joseph
2024-07-24 6:46 0% ` [EXTERNAL] " Akhil Goyal
2024-07-25 15:01 0% ` Kusztal, ArkadiuszX
2 siblings, 0 replies; 200+ results
From: Anoob Joseph @ 2024-07-24 5:07 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan, dev, bruce.richardson, ciara.power,
Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal, kai.ji,
jack.bond-preston, david.marchand, hemant.agrawal,
pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, g.singh, jianjay.zhou,
lee.daly
Cc: zhangfei.gao, Gowrishankar Muthukrishnan
> Subject: [PATCH] doc: announce cryptodev change to support EDDSA
>
> Announce the additions in cryptodev ABI to support EDDSA algorithm.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
^ permalink raw reply [relevance 0%]
* Re: [PATCH] doc: announce vhost changes to support asymmetric operation
2024-07-22 14:56 8% [PATCH] doc: announce vhost changes to support asymmetric operation Gowrishankar Muthukrishnan
@ 2024-07-23 18:30 4% ` Jerin Jacob
2024-07-25 9:29 4% ` [EXTERNAL] " Gowrishankar Muthukrishnan
0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2024-07-23 18:30 UTC (permalink / raw)
To: Gowrishankar Muthukrishnan
Cc: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, ruifeng.wang,
abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, lee.daly
On Mon, Jul 22, 2024 at 8:33 PM Gowrishankar Muthukrishnan
<gmuthukrishn@marvell.com> wrote:
>
> Announce vhost ABI changes to modify few functions to support
> asymmetric crypto operation.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
> https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-4-gmuthukrishn@marvell.com/
Looks like in this case adding new arguments to function. Could you
check ABI versing helps here? It seems like it can be easy manged with
ABI versioning.
https://doc.dpdk.org/guides/contributing/abi_versioning.html
> ---
> doc/guides/rel_notes/deprecation.rst | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..2f5c2c5a34 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,10 @@ Deprecation Notices
> will be deprecated and subsequently removed in DPDK 24.11 release.
> Before this, the new port library API (functions rte_swx_port_*)
> will gradually transition from experimental to stable status.
> +
> +* vhost: The function ``rte_vhost_crypto_create`` will accept a new parameter
> + to specify rte_mempool for asymmetric crypto session. The function
> + ``rte_vhost_crypto_finalize_requests`` will accept two new parameters,
> + where the first one is to specify vhost device id and other one is to specify
> + the virtio queue index. These two modifications are required to support
> + asymmetric crypto operation in vhost crypto and will break ABI.
> --
> 2.21.0
>
^ permalink raw reply [relevance 4%]
* RE: release candidate 24.07-rc2
@ 2024-07-23 2:14 4% ` Xu, HailinX
0 siblings, 0 replies; 200+ results
From: Xu, HailinX @ 2024-07-23 2:14 UTC (permalink / raw)
To: Marchand, David, dev
Cc: Kovacevic, Marko, Mcnamara, John, Richardson, Bruce,
Ferruh Yigit, Puttaswamy, Rajesh T
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Saturday, July 13, 2024 2:25 AM
> To: announce@dpdk.org
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Subject: release candidate 24.07-rc2
>
> A new DPDK release candidate is ready for testing:
> https://git.dpdk.org/dpdk/tag/?id=v24.07-rc2
>
> There are 461 new patches in this snapshot.
>
> Release notes:
> https://doc.dpdk.org/guides/rel_notes/release_24_07.html
>
> Highlights of 24.07-rc2:
> - SVE support in the hash library
> - FEC support in net/i40e and net/ice
> - log cleanups in drivers
> - various driver fixes and updates
>
> Please test and report issues on bugs.dpdk.org.
>
> DPDK 24.07-rc3 is expected in approximately one week.
>
> Thank you everyone
>
> --
> David Marchand
Update the test status for Intel part. dpdk24.07-rc2 all test is done. found four new issues.
New issues:
1. Bug 1497 - [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing -> not fix yet
2. ipsec-secgw tests fail -> Intel dev is under investigating
3. ice_rx_timestamp/single_queue_with_timestamp: obtained unexpected timestamp -> Intel dev is under investigating
4. cryptodev_cpu_aesni_mb_autotest is failing -> Intel dev is under investigating
# Basic Intel(R) NIC testing
* Build or compile:
*Build: cover the build test combination with latest GCC/Clang version and the popular OS revision such as Ubuntu22.04.4, Ubuntu24.04, Fedora40, RHEL9.3, RHEL9.4, FreeBSD14.0, SUSE15.5, OpenAnolis8.8, CBL-Mariner2.0 etc.
- All test passed.
*Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as Ubuntu24..04 and RHEL9.4.
- All test passed with latest dpdk.
* PF/VF(i40e, ixgbe): test scenarios including PF/VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc.
- All test case is done. No new issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
- Execution rate is done. found the 3 issue.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test, RFC2544 Zero packet loss performance test, etc.
- Execution rate is done. No new issue is found.
* Power and IPsec:
* Power: test scenarios including bi-direction/Telemetry/Empty Poll Lib/Priority Base Frequency, etc.
- Execution rate is done. No new issue is found.
* IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - QAT&SW/FIB library, etc.
- Execution rate is done. found the 2 issue.
# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE ESXI 8.0U1, etc.
- Execution rate is done. No new issue is found.
* Cryptodev:
*Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
- Execution rate is done. found the 4 issue.
*Performance test: test scenarios including Throughput Performance /Cryptodev Latency, etc.
- Execution rate is done. No performance drop.
Regards,
Xu, Hailin
^ permalink raw reply [relevance 4%]
* [PATCH] doc: announce vhost changes to support asymmetric operation
@ 2024-07-22 14:56 8% Gowrishankar Muthukrishnan
2024-07-23 18:30 4% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:56 UTC (permalink / raw)
To: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, ruifeng.wang,
abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, lee.daly
Cc: Gowrishankar Muthukrishnan
Announce vhost ABI changes to modify few functions to support
asymmetric crypto operation.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-4-gmuthukrishn@marvell.com/
---
doc/guides/rel_notes/deprecation.rst | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..2f5c2c5a34 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,10 @@ Deprecation Notices
will be deprecated and subsequently removed in DPDK 24.11 release.
Before this, the new port library API (functions rte_swx_port_*)
will gradually transition from experimental to stable status.
+
+* vhost: The function ``rte_vhost_crypto_create`` will accept a new parameter
+ to specify rte_mempool for asymmetric crypto session. The function
+ ``rte_vhost_crypto_finalize_requests`` will accept two new parameters,
+ where the first one is to specify vhost device id and other one is to specify
+ the virtio queue index. These two modifications are required to support
+ asymmetric crypto operation in vhost crypto and will break ABI.
--
2.21.0
^ permalink raw reply [relevance 8%]
* [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
@ 2024-07-22 14:55 5% Gowrishankar Muthukrishnan
2024-07-24 6:49 0% ` [EXTERNAL] " Akhil Goyal
2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
0 siblings, 2 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:55 UTC (permalink / raw)
To: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, ruifeng.wang,
abhinandan.gujjar, maxime.coquelin, chenbox,
sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
jianjay.zhou, lee.daly
Cc: Gowrishankar Muthukrishnan
Announce cryptodev changes to offload RSA asymmetric operation in
VirtIO PMD.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-gmuthukrishn@marvell.com/
https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-gmuthukrishn@marvell.com/
---
doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..26fec84aba 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,14 @@ Deprecation Notices
will be deprecated and subsequently removed in DPDK 24.11 release.
Before this, the new port library API (functions rte_swx_port_*)
will gradually transition from experimental to stable status.
+
+* cryptodev: The struct rte_crypto_rsa_padding will be moved from
+ rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
+ breaking ABI. The new location is recommended to comply with
+ virtio-crypto specification. Applications and drivers using
+ this struct will be updated.
+
+* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
+ in either exponent or quintuple format is changed from union to struct
+ data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
+ This change will not break existing applications.
--
2.21.0
^ permalink raw reply [relevance 5%]
* [PATCH] doc: announce cryptodev change to support EDDSA
@ 2024-07-22 14:53 8% Gowrishankar Muthukrishnan
2024-07-24 5:07 0% ` Anoob Joseph
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:53 UTC (permalink / raw)
To: dev, bruce.richardson, ciara.power, jerinj, fanzhang.oss,
arkadiuszx.kusztal, kai.ji, jack.bond-preston, david.marchand,
hemant.agrawal, pablo.de.lara.guarch, fiona.trahe,
declan.doherty, matan, ruifeng.wang, abhinandan.gujjar,
maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
jianjay.zhou, lee.daly
Cc: Anoob Joseph, zhangfei.gao, Gowrishankar Muthukrishnan
Announce the additions in cryptodev ABI to support EDDSA algorithm.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
https://patches.dpdk.org/project/dpdk/patch/0ae6a1afadac64050d80b0fd7712c4a6a8599e2c.1701273963.git.gmuthukrishn@marvell.com/
---
doc/guides/rel_notes/deprecation.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..fcbec965b1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,7 @@ Deprecation Notices
will be deprecated and subsequently removed in DPDK 24.11 release.
Before this, the new port library API (functions rte_swx_port_*)
will gradually transition from experimental to stable status.
+
+* cryptodev: The enum ``rte_crypto_asym_xform_type`` and struct ``rte_crypto_asym_op``
+ will be extended to include new values to support EDDSA. This will break
+ ABI compatibility with existing applications that use these data types.
--
2.21.0
^ permalink raw reply [relevance 8%]
* RE: IPv6 APIs rework
2024-07-18 21:34 3% ` Robin Jarry
@ 2024-07-19 8:25 0% ` Konstantin Ananyev
2024-07-19 9:12 0% ` Morten Brørup
1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-07-19 8:25 UTC (permalink / raw)
> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> > address and just provide a pointer to the memory inside the packet. Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
>
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.
>
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > > aligned.
>
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> > IPv6 is aligned only by 2 bytes.
>
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.
I am also not sure that this extra alignment (2B or 4B) here will give us any benefit,
while it most likely will introduce extra restrictions.
AFAIK, right now we do have ipv6 as array of plain chars, and there were no much
complaints about it.
So I am for keeping it 1B aligned.
Overall proposal looks reasonable to me... might be 24.11 is a good opportunity for such change.
Konstantin
^ permalink raw reply [relevance 0%]
* [PATCH v6 1/8] ethdev: support report register names and filter
@ 2024-07-22 6:58 8% ` Jie Hai
0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-07-22 6:58 UTC (permalink / raw)
To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Cc: lihuisong, fengchengwen
This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their module names.
The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by modules. And the original API
rte_eth_dev_get_reg_info() does not use the names and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "index_XXX", which means the location in the register table.
Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
doc/guides/rel_notes/release_24_07.rst | 8 ++++++
lib/ethdev/ethdev_trace.h | 2 ++
lib/ethdev/rte_dev_info.h | 11 ++++++++
lib/ethdev/rte_ethdev.c | 38 ++++++++++++++++++++++++++
lib/ethdev/rte_ethdev.h | 29 ++++++++++++++++++++
lib/ethdev/version.map | 3 ++
6 files changed, 91 insertions(+)
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 058609b0f36b..b0bb49c8f29e 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -186,6 +186,12 @@ New Features
* Added defer queue reclamation via RCU.
* Added SVE support for bulk lookup.
+* **Added support for dumping registers with names and filtering by modules.**
+
+ * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to filter the
+ registers by module names and get the information (names, values and other
+ attributes) of the filtered registers.
+
Removed Items
-------------
@@ -241,6 +247,8 @@ ABI Changes
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
=======================================================
+ * ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+ structure for filtering by modules and reporting names of registers.
* No ABI change that would break compatibility with 23.11.
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb70..0c4780a09ef5 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -1152,6 +1152,8 @@ RTE_TRACE_POINT(
rte_trace_point_emit_u32(info->length);
rte_trace_point_emit_u32(info->width);
rte_trace_point_emit_u32(info->version);
+ rte_trace_point_emit_ptr(info->names);
+ rte_trace_point_emit_ptr(info->filter);
rte_trace_point_emit_int(ret);
)
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..26b777f9836e 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
#include <stdint.h>
+#define RTE_ETH_REG_NAME_SIZE 64
+struct rte_eth_reg_name {
+ char name[RTE_ETH_REG_NAME_SIZE];
+};
+
/*
* Placeholder for accessing device registers
*/
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
uint32_t length; /**< Number of registers to fetch */
uint32_t width; /**< Size of device register */
uint32_t version; /**< Device version */
+ /**
+ * Name of target module, filter for target subset of registers.
+ * This field could affects register selection for data/length/names.
+ */
+ const char *filter;
+ struct rte_eth_reg_name *names; /**< Registers name saver */
};
/*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..30ca4a0043c5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
int
rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+ struct rte_dev_reg_info reg_info = { 0 };
+ int ret;
+
+ if (info == NULL) {
+ RTE_ETHDEV_LOG_LINE(ERR,
+ "Cannot get ethdev port %u register info to NULL",
+ port_id);
+ return -EINVAL;
+ }
+
+ reg_info.length = info->length;
+ reg_info.data = info->data;
+
+ ret = rte_eth_dev_get_reg_info_ext(port_id, ®_info);
+ if (ret != 0)
+ return ret;
+
+ info->length = reg_info.length;
+ info->width = reg_info.width;
+ info->version = reg_info.version;
+ info->offset = reg_info.offset;
+
+ return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
{
struct rte_eth_dev *dev;
+ uint32_t i;
int ret;
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6402,12 +6431,21 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
return -EINVAL;
}
+ if (info->names != NULL && info->length != 0)
+ memset(info->names, 0, sizeof(struct rte_eth_reg_name) * info->length);
+
if (*dev->dev_ops->get_reg == NULL)
return -ENOTSUP;
ret = eth_err(port_id, (*dev->dev_ops->get_reg)(dev, info));
rte_ethdev_trace_get_reg_info(port_id, info, ret);
+ /* Report the default names if drivers not report. */
+ if (ret == 0 && info->names != NULL && strlen(info->names[0].name) == 0) {
+ for (i = 0; i < info->length; i++)
+ snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+ "index_%u", info->offset + i);
+ }
return ret;
}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7ad..02cb3c07f742 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5071,6 +5071,35 @@ __rte_experimental
int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
struct rte_power_monitor_cond *pmc);
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param info
+ * Pointer to rte_dev_reg_info structure to fill in.
+ * - If info->filter is NULL, return info for all registers (seen as filter
+ * none).
+ * - If info->filter is not NULL, return error if the driver does not support
+ * filter. Fill the length field with filtered register number.
+ * - If info->data is NULL, the function fills in the width and length fields.
+ * - If info->data is not NULL, ethdev considers there are enough spaces to
+ * store the registers, and the values of registers with the filter string
+ * as the module name are put into the buffer pointed at by info->data.
+ * - If info->names is not NULL, drivers should fill it or the ethdev fills it
+ * with default names.
+ * @return
+ * - (0) if successful.
+ * - (-ENOTSUP) if hardware doesn't support.
+ * - (-EINVAL) if bad parameter.
+ * - (-ENODEV) if *port_id* invalid.
+ * - (-EIO) if device is removed.
+ * - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
/**
* Retrieve device registers and register attributes (number of registers and
* register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b5c..e3289e999382 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -325,6 +325,9 @@ EXPERIMENTAL {
rte_flow_template_table_resizable;
rte_flow_template_table_resize;
rte_flow_template_table_resize_complete;
+
+ # added in 24.07
+ rte_eth_dev_get_reg_info_ext;
};
INTERNAL {
--
2.33.0
^ permalink raw reply [relevance 8%]
* Re: [RFC v2] ethdev: an API for cache stashing hints
2024-07-17 2:27 3% ` Stephen Hemminger
2024-07-18 18:48 0% ` Wathsala Wathawana Vithanage
@ 2024-07-20 3:05 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2024-07-20 3:05 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Wathsala Wathawana Vithanage, dev, thomas, Ferruh Yigit,
Andrew Rybchenko, nd, Dhruv Tripathi
> On Jul 16, 2024, at 9:27 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> On Mon, 15 Jul 2024 22:11:41 +0000
> Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
>
>> An application provides cache stashing hints to the ethernet devices to
>> improve memory access latencies from the CPU and the NIC. This patch
>> introduces three distinct hints for this purpose.
>>
>> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
>> (CPU) requires the data written by the NIC immediately. This implies
>> that the CPU expects to read data from its local cache rather than LLC
>> or main memory if possible. This would improve memory access latency in
>> the Rx path. For PCI devices with TPH capability, these hints translate
>> into DWHR (Device Writes Host Reads) access pattern. This hint is only
>> valid for receive queues.
>>
>> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
>> the device access the data structure equally. Rx/Tx queue descriptors
>> fit the description of such data. This hint applies to both Rx and Tx
>> directions. In the PCI TPH context, this hint translates into a
>> Bi-Directional access pattern.
>>
>> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
>> involved in a given device's receive or transmit paths. This implies
>> that only devices are involved in the IO path. Depending on the
>> implementation, this hint may result in data getting placed in a cache
>> close to the device or not cached at all. For PCI devices with TPH
>> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
>> access patterns. This is a bidirectional hint, and it can be applied to
>> both Rx and Tx queues.
>>
>> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
>> reads data written by the host (CPU) that may still be in the host's
>> local cache but is not required by the host anytime soon. This hint is
>> intended to prevent unnecessary cache invalidations that cause
>> interconnect latencies when a device writes to a buffer already in host
>> cache memory. In DPDK, this could happen with the recycling of mbufs
>> where a mbuf is placed in the Tx queue that then gets back into mempool
>> and gets recycled back into the Rx queue, all while a copy is being held
>> in the CPU's local cache unnecessarily. By using this hint on supported
>> platforms, the mbuf will be invalidated after the device completes the
>> buffer reading, but it will be well before the buffer gets recycled and
>> updated in the Rx path. This hint is only valid for transmit queues.
>>
>> Applications use three main interfaces in the ethdev library to discover
>> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
>> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
>> is used to set hints on an Rx queue. Both of these functions take the
>> following parameters as inputs: a port_id (the id of the ethernet
>> device), a cpu_id (the target CPU), a cache_level (the level of the
>> cache hierarchy the data should be stashed into), a queue_id (the queue
>> the hints are applied to). In addition to the above list of parameters,
>> a type parameter indicates the type of the object the application
>> expects to be stashed by the hardware. Depending on the hardware, these
>> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
>> packet headers, and packet payloads. These are indicated by the macros
>> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
>> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
>> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
>> type. When an offset is used, the offset parameter in the above two
>> functions should be set appropriately.
>>
>> rte_eth_dev_stashing_hints_discover is used to discover the object types
>> and hints supported in the platform and the device. The function takes
>> types and hints pointers used as a bit vector to indicate hints and
>> types supported by the NIC. An application that intends to use stashing
>> hints should first discover supported hints and types and then use the
>> functions rte_eth_dev_stashing_hints_tx and
>> rte_eth_dev_stashing_hints_rx as required to set stashing hints
>> accordingly. eth_dev_ops structure has been updated with two new ops
>> that a PMD should implement to support cache stashing hints. A PMD that
>> intends to support cache stashing hints should initialize the
>> set_stashing_hints function pointer to a function that issues hints to
>> the underlying hardware in compliance with platform capabilities. The
>> same PMD should also implement a function that can return two-bit fields
>> indicating supported types and hints and then initialize the
>> discover_stashing_hints function pointer with it. If the NIC supports
>> cache stashing hints, the NIC should always set the
>> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
>>
>> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
>
> My initial reaction is negative on this. The DPDK does not need more nerd knobs
> for performance. If it is a performance win, it should be automatic and handled
> by the driver.
>
IMO, DPDK provides low level APIs and they should provide flexibility for users to control what part of the data from NIC is stashed where. For ex: currently available systems across multiple architectures provide system wide configuration to control stashing data from the NIC to system cache. The configuration allows for all the data from NIC to be stated or none. Whereas some applications need access to just the headers and some others need access to all the packet data.
> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.
Agree. Extending the ABI would result in a better solution rather than another set of APIs.
^ permalink raw reply [relevance 3%]
* Re: IPv6 APIs rework
2024-07-19 9:12 0% ` Morten Brørup
@ 2024-07-19 10:41 0% ` Medvedkin, Vladimir
0 siblings, 0 replies; 200+ results
From: Medvedkin, Vladimir @ 2024-07-19 10:41 UTC (permalink / raw)
To: Morten Brørup, Robin Jarry, Vladimir Medvedkin, stephen
Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
Bruce Richardson, Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram,
Kiran Kumar K, Satha Rao, Harman Kalra, Ankur Dwivedi,
Anoob Joseph, Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal,
Ajit Khaparde, Somnath Kotur, Chas Williams, Min Hu (Connor),
Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
Chenbo Xia
Hi Morten,
On 19/07/2024 10:12, Morten Brørup wrote:
>> From: Robin Jarry [mailto:rjarry@redhat.com]
>>
>> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
>>> I think alignment should be 1 since in FIB6 users usually don't copy
>> IPv6
>>> address and just provide a pointer to the memory inside the packet.
> How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.
>
> What you are suggesting only works with single lookup, not bulk lookup.
You're right, sorry, confused with an internal implementation that
passes an array of pointers
>> Current
>>> vector implementation loads IPv6 addresses using unaligned access (
>>> _mm512_loadu_si512) so it doesn't rely on alignment.
>> Yes, my intention was exactly that, being able to map that structure
>> directly in packets without copying them on the stack.
> This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.
>
> It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.
>
>>>> 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
>> aligned,
>>>> they are 8 byte aligned. So we cannot make the IPv6 address type 16
>> byte
>>>> aligned.
>>> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
>> then
>>> IPv6 is aligned only by 2 bytes.
>> We probably could safely say that aligning on 2 bytes would be OK. But
>> is there any benefit, performance wise, in doing so? Keeping the same
>> alignment as before the change would at least make it ABI compatible.
> I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!
>
> There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
> E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.
>
> The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.
>
> The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.
>
--
Regards,
Vladimir
^ permalink raw reply [relevance 0%]
* RE: IPv6 APIs rework
2024-07-18 21:34 3% ` Robin Jarry
2024-07-19 8:25 0% ` Konstantin Ananyev
@ 2024-07-19 9:12 0% ` Morten Brørup
2024-07-19 10:41 0% ` Medvedkin, Vladimir
1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-07-19 9:12 UTC (permalink / raw)
To: Robin Jarry, Vladimir Medvedkin, stephen
Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
Chas Williams, Min Hu (Connor),
Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
Maxime Coquelin, Chenbo Xia
> From: Robin Jarry [mailto:rjarry@redhat.com]
>
> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy
> IPv6
> > address and just provide a pointer to the memory inside the packet.
How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.
What you are suggesting only works with single lookup, not bulk lookup.
> Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
>
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.
This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.
It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.
>
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
> aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16
> byte
> > > aligned.
>
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
> then
> > IPv6 is aligned only by 2 bytes.
>
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.
I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!
There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.
The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.
The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.
^ permalink raw reply [relevance 0%]
* Re: IPv6 APIs rework
@ 2024-07-18 21:34 3% ` Robin Jarry
2024-07-19 8:25 0% ` Konstantin Ananyev
2024-07-19 9:12 0% ` Morten Brørup
0 siblings, 2 replies; 200+ results
From: Robin Jarry @ 2024-07-18 21:34 UTC (permalink / raw)
To: Vladimir Medvedkin, Morten Brørup
Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
Chas Williams, Min Hu (Connor),
Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
Maxime Coquelin, Chenbo Xia
Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> address and just provide a pointer to the memory inside the packet. Current
> vector implementation loads IPv6 addresses using unaligned access (
> _mm512_loadu_si512) so it doesn't rely on alignment.
Yes, my intention was exactly that, being able to map that structure
directly in packets without copying them on the stack.
> > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > aligned.
> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> IPv6 is aligned only by 2 bytes.
We probably could safely say that aligning on 2 bytes would be OK. But
is there any benefit, performance wise, in doing so? Keeping the same
alignment as before the change would at least make it ABI compatible.
^ permalink raw reply [relevance 3%]
* RE: [RFC v2] ethdev: an API for cache stashing hints
2024-07-17 2:27 3% ` Stephen Hemminger
@ 2024-07-18 18:48 0% ` Wathsala Wathawana Vithanage
2024-07-20 3:05 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Wathsala Wathawana Vithanage @ 2024-07-18 18:48 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, thomas, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi,
Honnappa Nagarahalli, nd
>
> My initial reaction is negative on this. The DPDK does not need more nerd
> knobs for performance. If it is a performance win, it should be automatic and
> handled by the driver.
>
> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.
Thanks, Steve, for the feedback. My thesis is that in a DPDK-based packet processing system,
the application is more knowledgeable of memory buffer (packets) usage than the generic
underlying hardware or the PMD (I have provided some examples below with the hint they
would map into). Recognizing such cases, PCI SIG introduced TLP Packet Processing Hints (TPH).
Consequently, many interconnect designers enabled support for TPH in their interconnects so
that based on steering tags provided by an application to a NIC, which sets them in the TLP
header, memory buffers can be targeted toward a CPU at the desired level in the cache hierarchy.
With this proposed API, applications provide cache-stashing hints to ethernet devices to improve
memory access latencies from the CPU and the NIC to improve system performance.
Listed below are some use cases.
- A run-to-completion application may not need the next packet immediately in L1D. It may rather
issue a prefetch and do other work with packet and application data already in L1D before it needs
the next packet. A generic PMD will not know such subtleties in the application endpoint, and it
would resolve to stash buffers into the L1D indiscriminately or not do it at all. But, with a hint from
the application that buffers of the packets will be stashed at a cache level suitable for the
application. (like UNIX MADV_DONOTNEED but for mbufs at cache line granularity)
- Similarly, a pipelined application may use a hint that advice the buffers are needed in L1D as soon
as they arrive. (parallels MADV_WILLNEED)
- Let's call the time between a mbuf being allocated into an Rx queue, freed back into mempool in
the Tx path, and once again reallocated back in the Same Rx queue the "buffer recycle window".
The length of the buffer recycle window is a function of the application in question; the PMD or the
NIC has no prior knowledge of this property of an application. A buffer may stay in the L1D of a CPU
throughout the entire recycle window if the window is short enough for that application.
An application with a short buffer recycle window may hint to the platform that the Tx buffer is not
needed anytime soon in the CPU cache via a hint to avoid unnecessary cache invalidations when
the buffer gets written by the Rx packet for the second time. (parallels MADV_DONOTNEED)
^ permalink raw reply [relevance 0%]
* Re: [PATCH] net/mlx5: fix compilation warning in GCC-9.1
2024-07-07 9:57 4% [PATCH] net/mlx5: fix compilation warning in GCC-9.1 Gregory Etelson
@ 2024-07-18 7:24 4% ` Raslan Darawsheh
0 siblings, 0 replies; 200+ results
From: Raslan Darawsheh @ 2024-07-18 7:24 UTC (permalink / raw)
To: Gregory Etelson, dev
Cc: Maayan Kashani, stable, Dariusz Sosnowski, Slava Ovsiienko,
Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad
Hi,
From: Gregory Etelson <getelson@nvidia.com>
Sent: Sunday, July 7, 2024 12:57 PM
To: dev@dpdk.org
Cc: Gregory Etelson; Maayan Kashani; Raslan Darawsheh; stable@dpdk.org; Dariusz Sosnowski; Slava Ovsiienko; Bing Zhao; Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH] net/mlx5: fix compilation warning in GCC-9.1
GCC has introduced a bugfix in 9.1 that changed GCC ABI in ARM setups:
https://gcc.gnu.org/gcc-9/changes.html
```
On Arm targets (arm*-*-*), a bug in the implementation of the
procedure call standard (AAPCS) in the GCC 6, 7 and 8 releases
has been fixed: a structure containing a bit-field based on a 64-bit
integral type and where no other element in a structure required
64-bit alignment could be passed incorrectly to functions.
This is an ABI change. If the option -Wpsabi is enabled
(on by default) the compiler will emit a diagnostic note for code
that might be affected.
```
The patch fixes PMD compilation in the INTEGRITY flow item.
Fixes: 23b0a8b298b1 ("net/mlx5: fix integrity item validation and translation")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply [relevance 4%]
* [DPDK/eventdev Bug 1497] [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing
@ 2024-07-18 3:42 10% bugzilla
0 siblings, 0 replies; 200+ results
From: bugzilla @ 2024-07-18 3:42 UTC (permalink / raw)
To: dev
[-- Attachment #1: Type: text/plain, Size: 4656 bytes --]
https://bugs.dpdk.org/show_bug.cgi?id=1497
Bug ID: 1497
Summary: [dpdk-24.07] [ABI][meson test]
driver-tests/event_dma_adapter_autotest test hang when
do ABI testing
Product: DPDK
Version: 24.07
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: eventdev
Assignee: dev@dpdk.org
Reporter: yux.jiang@intel.com
Target Milestone: ---
[Environment]
DPDK version: 24.07.0-rc2
DPDK ABI version: 23.11.0
OS: RHEL9.0/5.14.0-70.13.1.el9_0.x86_64
Compiler: gcc version 11.2.1 20220127 (Red Hat 11.2.1-9)
Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.
1, Build latest main dpdk24.03-rc1
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf /root/tmp/dpdk_share_lib /root/shared_lib_dpdk
DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110
install
mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk
ll /root/shared_lib_dpdk
cat /root/.bashrc | grep LD_LIBRARY_PATH
sed -i 's#export LD_LIBRARY_PATH=.*#export
LD_LIBRARY_PATH=/root/shared_lib_dpdk#g' /root/.bashrc
2, Build LTS dpdk23.11.0
rm /root/dpdk
tar zxvf dpdk_abi.tar.gz -C ~
cd ~/dpdk/
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf x86_64-native-linuxapp-gcc/lib
rm -rf x86_64-native-linuxapp-gcc/drivers
3, Launch dpdk-test and run event_dma_adapter_autotest
MALLOC_PERTURB_=132 DPDK_TEST=event_dma_adapter_autotest
/root/dpdk/x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d
/root/shared_lib_dpdk --vdev=dma_skeleton
Show the output from the previous commands.
[root@ABI-80 dpdk]# MALLOC_PERTURB_=132 DPDK_TEST=event_dma_adapter_autotest
/root/dpdk/x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d
/root/shared_lib_dpdk --vdev=dma_skeleton
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
skeldma_probe(): Create dma_skeleton dmadev with lcore-id -1
APP: HPET is not enabled, using TSC as default timer
RTE>>event_dma_adapter_autotest
+ ------------------------------------------------------- +
+ Test Suite : Event dma adapter test suite
+ ------------------------------------------------------- +
+ TestCase [ 0] : test_dma_adapter_create succeeded
+ TestCase [ 1] : test_dma_adapter_vchan_add_del succeeded
+------------------------------------------------------+
+ DMA adapter stats for instance 0:
+ Event port poll count 0x0
+ Event dequeue count 0x0
+ DMA dev enqueue count 0x0
+ DMA dev enqueue failed count 0x0
+ DMA dev dequeue count 0x0
+ Event enqueue count 0x0
+ Event enqueue retry count 0x0
+ Event enqueue fail count 0x0
+------------------------------------------------------+
+ TestCase [ 2] : test_dma_adapter_stats succeeded
+ TestCase [ 3] : test_dma_adapter_params succeeded
[Expected Result]
Test ok.
[Regression]
Is this issue a regression: (Y/N) Y
The first bad commit:
commit 588dcac2361011556934166d93da62dae712ce69
Author: Pavan Nikhilesh <pbhagavatula@marvell.com>
Date: Fri Jun 7 16:06:25 2024 +0530
eventdev/dma: reorganize event DMA ops
Re-organize event DMA ops structure to allow holding
source and destination pointers without the need for
additional memory, the mempool allocating memory for
rte_event_dma_adapter_ops can size the structure to
accommodate all the needed source and destination
pointers.
Add multiple words for holding user metadata, adapter
implementation specific metadata and event metadata.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
-----------Note---------
Based on dpdk24.07-rc2 which includes
https://bugs.dpdk.org/show_bug.cgi?id=1469's fix patch, also test hang.
Please confirm it need fix or not on ABI compatibility testing or it needn't
test for ABI compatibility testing. Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #2: Type: text/html, Size: 6939 bytes --]
^ permalink raw reply [relevance 10%]
* Re: [RFC v2] ethdev: an API for cache stashing hints
@ 2024-07-17 2:27 3% ` Stephen Hemminger
2024-07-18 18:48 0% ` Wathsala Wathawana Vithanage
2024-07-20 3:05 3% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2024-07-17 2:27 UTC (permalink / raw)
To: Wathsala Vithanage
Cc: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi
On Mon, 15 Jul 2024 22:11:41 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
>
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
>
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions. In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
>
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.
>
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues.
>
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
>
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
>
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
My initial reaction is negative on this. The DPDK does not need more nerd knobs
for performance. If it is a performance win, it should be automatic and handled
by the driver.
If you absolutely have to have another flag, then it should be in existing config
(yes, extend the ABI) rather than adding more flags and calls in ethdev.
^ permalink raw reply [relevance 3%]
* RE: [EXTERNAL] [PATCH v6] graph: expose node context as pointers
2024-07-05 14:52 4% [PATCH v6] graph: expose node context as pointers Robin Jarry
@ 2024-07-12 11:39 0% ` Kiran Kumar Kokkilagadda
0 siblings, 0 replies; 200+ results
From: Kiran Kumar Kokkilagadda @ 2024-07-12 11:39 UTC (permalink / raw)
To: Robin Jarry, dev, Jerin Jacob, Nithin Kumar Dabilpuram, Zhirun Yan
> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Friday, July 5, 2024 8:23 PM
> To: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Zhirun Yan <yanzhirun_163@163.com>
> Subject: [EXTERNAL] [PATCH v6] graph: expose node context as pointers
>
> In some cases, the node context data is used to store two pointers because
> the data is larger than the reserved 16 bytes. Having to define intermediate
> structures just to be able to cast is tedious. And without intermediate
> structures, casting
> In some cases, the node context data is used to store two pointers because
> the data is larger than the reserved 16 bytes. Having to define intermediate
> structures just to be able to cast is tedious. And without intermediate
> structures, casting to opaque pointers is hard without violating strict aliasing
> rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is itself cache aligned. Use __rte_cache_aligned to
> preserve existing alignment on architectures where cache lines are 128 bytes.
>
> Add a static assert to ensure that the fast path area does not grow beyond a
> 64 bytes cache line.
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
Acked-by: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>
>
> Notes:
> v6:
>
> * Fix ABI breakage on arm64 (and all platforms that have
> RTE_CACHE_LINE_SIZE=128).
> * This patch will cause CI failures without libabigail 2.5. See this commit
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__sourceware.org_git_-3Fp-3Dlibabigail.git-3Ba-3Dcommitdiff-3Bh-
> 3Df821c2be3fff2047ef8fc436f6f02301812d166f&d=DwIDAg&c=nKjWec2b6R0m
> OyPaz7xtfQ&r=owEKckYY4FTmil1Z6oBURwkTThyuRbLAY9LdfiaT6HA&m=p2InA
> hlxVf3SXbWbwMoGbsA2ylexBEm_WKDY6mf88Lgp6hbAD5UNuKGkFiO0F8vV&
> s=mtRUGLkylM_33OiTJTBoxFQNDQh6p7xIyNwmDu9GgTk&e=
> for more details.
>
> v5:
>
> * Helper functions to hide casting proved to be harder than expected.
> Naive casting may even be impossible without breaking strict aliasing
> rules. The only other option would be to use explicit memcpy calls.
> * Unnamed union tentative again. As suggested by Tyler (thank you!),
> using an intermediate unnamed struct to carry the alignment produces
> consistent ABI in C and C++.
> * Also, Tyler (thank you!) suggested that the fast path area alignment
> size may be incorrect for architectures where the cache line is not 64
> bytes. There will be a 64 bytes hole in the structure at the end of
> the unnamed struct before the zero length next nodes array. Use
> __rte_cache_min_aligned to preserve existing alignment.
>
> v4:
>
> * Replaced the unnamed union with helper inline functions.
>
> v3:
>
> * Added __extension__ to the unnamed struct inside the union.
> * Fixed C++ header checks.
> * Replaced alignas() with an explicit static_assert.
>
> lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++--------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..8d8956fdddda 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -12,7 +12,9 @@
> * process, enqueue and move streams of objects to the next nodes.
> */
>
> +#include <assert.h>
> #include <stdalign.h>
> +#include <stddef.h>
>
> #include <rte_common.h>
> #include <rte_cycles.h>
> @@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
> } dispatch;
> };
> /* Fast path area */
> + __extension__ struct __rte_cache_aligned {
> #define RTE_NODE_CTX_SZ 16
> - alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**<
> Node Context. */
> - uint16_t size; /**< Total number of objects available. */
> - uint16_t idx; /**< Number of objects used. */
> - rte_graph_off_t off; /**< Offset of node in the graph reel. */
> - uint64_t total_cycles; /**< Cycles spent in this node. */
> - uint64_t total_calls; /**< Calls done to this node. */
> - uint64_t total_objs; /**< Objects processed by this node. */
> + union {
> + uint8_t ctx[RTE_NODE_CTX_SZ];
> + __extension__ struct {
> + void *ctx_ptr;
> + void *ctx_ptr2;
> + };
> + }; /**< Node Context. */
> + uint16_t size; /**< Total number of objects
> available. */
> + uint16_t idx; /**< Number of objects used. */
> + rte_graph_off_t off; /**< Offset of node in the graph reel.
> */
> + uint64_t total_cycles; /**< Cycles spent in this node. */
> + uint64_t total_calls; /**< Calls done to this node. */
> + uint64_t total_objs; /**< Objects processed by this node.
> */
> union {
> void **objs; /**< Array of object pointers. */
> uint64_t objs_u64;
> @@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
> rte_node_process_t process; /**< Process function.
> */
> uint64_t process_u64;
> };
> - alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**<
> Next nodes. */
> + alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node
> *nodes[]; /**< Next nodes. */
> + };
> };
>
> +static_assert(offsetof(struct rte_node, nodes) - offsetof(struct rte_node, ctx)
> + == RTE_CACHE_LINE_MIN_SIZE, "rte_node fast path area must fit in
> 64
> +bytes");
> +
> /**
> * @internal
> *
> --
> 2.45.2
^ permalink raw reply [relevance 0%]
* [PATCH v8 5/5] dts: add API doc generation
@ 2024-07-12 8:57 3% ` Juraj Linkeš
2024-07-30 13:51 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Juraj Linkeš @ 2024-07-12 8:57 UTC (permalink / raw)
To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Cc: dev, Juraj Linkeš, Luca Vizzarro
The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.
Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.
There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
code.
* Also the same Python packages as DTS, for the same reason.
[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
buildtools/call-sphinx-build.py | 3 +++
doc/api/doxy-api-index.md | 3 +++
doc/api/doxy-api.conf.in | 2 ++
doc/api/meson.build | 4 ++++
doc/guides/conf.py | 33 +++++++++++++++++++++++++++++++-
doc/guides/meson.build | 1 +
doc/guides/tools/dts.rst | 34 ++++++++++++++++++++++++++++++++-
dts/doc/meson.build | 27 ++++++++++++++++++++++++++
dts/meson.build | 16 ++++++++++++++++
meson.build | 1 +
10 files changed, 122 insertions(+), 2 deletions(-)
create mode 100644 dts/doc/meson.build
create mode 100644 dts/meson.build
diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 693274da4e..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -14,10 +14,13 @@
parser.add_argument('version')
parser.add_argument('src')
parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
args, extra_args = parser.parse_known_args()
# set the version in environment for sphinx to pick up
os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+ os.environ['DTS_ROOT'] = args.dts_root
sphinx_cmd = [args.sphinx] + extra_args
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
[experimental APIs](@ref rte_compat.h),
[ABI versioning](@ref rte_function_versioning.h),
[version](@ref rte_version.h)
+
+- **tests**:
+ [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE = YES
SORT_MEMBER_DOCS = NO
SOURCE_BROWSER = YES
+ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
EXAMPLE_PATH = @TOPDIR@/examples
EXAMPLE_PATTERNS = *.c
EXAMPLE_RECURSIVE = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,10 @@ cdata.set('WARN_AS_ERROR', 'NO')
if get_option('werror')
cdata.set('WARN_AS_ERROR', 'YES')
endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
# configure HTML Doxygen run
html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 8b440fb2a9..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -9,7 +9,7 @@
from os import environ
from os.path import basename, dirname
from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
import configparser
@@ -23,6 +23,37 @@
file=stderr)
pass
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+ 'members': True,
+ 'member-order': 'bysource',
+ 'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+ path.append(dts_root)
+ # DTS Sidebar config
+ html_theme_options = {
+ 'collapse_navigation': False,
+ 'navigation_depth': -1,
+ }
+
stop_on_error = ('-W' in argv)
project = 'Data Plane Development Kit'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Intel Corporation
+doc_guides_source_dir = meson.current_source_dir()
sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
when some of the basics are not met.
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
The style must conform to the `Google style
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+ poetry install --no-root --with docs
+ poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+ ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+ Make sure to fix any Sphinx warnings when adding or updating docstrings,
+ and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
Configuration Schema
--------------------
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+ subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+ extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+ output: 'html',
+ command: [sphinx_wrapper, sphinx, meson.project_version(),
+ meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+ build_by_default: false,
+ install: get_option('enable_docs'),
+ install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+ message = 'No docs targets found'
+else
+ message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+ depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
# build docs
subdir('doc')
+subdir('dts')
# build any examples explicitly requested - useful for developers - and
# install any example code into the appropriate install path
--
2.34.1
^ permalink raw reply [relevance 3%]
* Re: [PATCH v4] ethdev: Add link_speed lanes support
2024-07-09 11:10 4% ` Ferruh Yigit
@ 2024-07-09 21:20 0% ` Damodharam Ammepalli
0 siblings, 0 replies; 200+ results
From: Damodharam Ammepalli @ 2024-07-09 21:20 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: ajit.khaparde, dev, huangdengdui, kalesh-anakkur.purayil
On Tue, Jul 9, 2024 at 4:10 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 7/9/2024 12:22 AM, Damodharam Ammepalli wrote:
> > Update the eth_dev_ops structure with new function vectors
> > to get, get capabilities and set ethernet link speed lanes.
> > Update the testpmd to provide required config and information
> > display infrastructure.
> >
> > The supporting ethernet controller driver will register callbacks
> > to avail link speed lanes config and get services. This lanes
> > configuration is applicable only when the nic is forced to fixed
> > speeds. In Autonegiation mode, the hardware automatically
> > negotiates the number of lanes.
> >
> > These are the new commands.
> >
> > testpmd> show port 0 speed_lanes capabilities
> >
> > Supported speeds Valid lanes
> > -----------------------------------
> > 10 Gbps 1
> > 25 Gbps 1
> > 40 Gbps 4
> > 50 Gbps 1 2
> > 100 Gbps 1 2 4
> > 200 Gbps 2 4
> > 400 Gbps 4 8
> > testpmd>
> >
> > testpmd>
> > testpmd> port stop 0
> > testpmd> port config 0 speed_lanes 4
> > testpmd> port config 0 speed 200000 duplex full
> >
>
> Is there a requirement to set speed before speed_lane?
> Because I expect driver will verify if a speed_lane value is valid or
> not for a specific speed value. In above usage, driver will verify based
> on existing speed, whatever it is, later chaning speed may cause invalid
> speed_lane configuration.
>
>
There is no requirement to set speed before speed_lane.
If the controller supports lanes configuration capability, if no lanes
are given (which is 0)
the driver will pick up the lowest speed (eg: 200 gbps with NRZ mode),
if a fixed speed
already exists or is configured in tandem with speed_lanes. If speed
is already Auto,
test-pmd's speed_lane config is ignored.
> > testpmd> port start 0
> > testpmd>
> > testpmd> show port info 0
> >
> > ********************* Infos for port 0 *********************
> > MAC address: 14:23:F2:C3:BA:D2
> > Device name: 0000:b1:00.0
> > Driver name: net_bnxt
> > Firmware-version: 228.9.115.0
> > Connect to socket: 2
> > memory allocation on the socket: 2
> > Link status: up
> > Link speed: 200 Gbps
> > Active Lanes: 4
> > Link duplex: full-duplex
> > Autoneg status: Off
> >
> > Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
> > ---
> > v2->v3 Consolidating the testpmd and rtelib patches into a single patch
> > as requested.
> > v3->v4 Addressed comments and fix help string and documentation.
> >
> > app/test-pmd/cmdline.c | 230 +++++++++++++++++++++++++++++++++++++
> > app/test-pmd/config.c | 69 ++++++++++-
> > app/test-pmd/testpmd.h | 4 +
> > lib/ethdev/ethdev_driver.h | 77 +++++++++++++
> > lib/ethdev/rte_ethdev.c | 51 ++++++++
> > lib/ethdev/rte_ethdev.h | 92 +++++++++++++++
> > lib/ethdev/version.map | 5 +
> > 7 files changed, 526 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > index b7759e38a8..a507df31d8 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -284,6 +284,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> >
> > "dump_log_types\n"
> > " Dumps the log level for all the dpdk modules\n\n"
> > +
> > + "show port (port_id) speed_lanes capabilities"
> > + " Show speed lanes capabilities of a port.\n\n"
> > );
> > }
> >
> > @@ -823,6 +826,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> > "port config (port_id) txq (queue_id) affinity (value)\n"
> > " Map a Tx queue with an aggregated port "
> > "of the DPDK port\n\n"
> > +
> > + "port config (port_id|all) speed_lanes (0|1|4|8)\n"
> > + " Set number of lanes for all ports or port_id for a forced speed\n\n"
> > );
> > }
> >
> > @@ -1560,6 +1566,110 @@ static cmdline_parse_inst_t cmd_config_speed_specific = {
> > },
> > };
> >
> > +static int
> > +parse_speed_lanes_cfg(portid_t pid, uint32_t lanes)
> > +{
> > + int ret;
> > + uint32_t lanes_capa;
> > +
> > + ret = parse_speed_lanes(lanes, &lanes_capa);
> > + if (ret < 0) {
> > + fprintf(stderr, "Unknown speed lane value: %d for port %d\n", lanes, pid);
> > + return -1;
> > + }
> > +
> > + ret = rte_eth_speed_lanes_set(pid, lanes_capa);
> > + if (ret == -ENOTSUP) {
> > + fprintf(stderr, "Function not implemented\n");
> > + return -1;
> > + } else if (ret < 0) {
> > + fprintf(stderr, "Set speed lanes failed\n");
> > + return -1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/* *** display speed lanes per port capabilities *** */
> > +struct cmd_show_speed_lanes_result {
> > + cmdline_fixed_string_t cmd_show;
> > + cmdline_fixed_string_t cmd_port;
> > + cmdline_fixed_string_t cmd_keyword;
> > + portid_t cmd_pid;
> > +};
> > +
> > +static void
> > +cmd_show_speed_lanes_parsed(void *parsed_result,
> > + __rte_unused struct cmdline *cl,
> > + __rte_unused void *data)
> > +{
> > + struct cmd_show_speed_lanes_result *res = parsed_result;
> > + struct rte_eth_speed_lanes_capa *speed_lanes_capa;
> > + unsigned int num;
> > + int ret;
> > +
> > + if (!rte_eth_dev_is_valid_port(res->cmd_pid)) {
> > + fprintf(stderr, "Invalid port id %u\n", res->cmd_pid);
> > + return;
> > + }
> > +
> > + ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, NULL, 0);
> > + if (ret == -ENOTSUP) {
> > + fprintf(stderr, "Function not implemented\n");
> > + return;
> > + } else if (ret < 0) {
> > + fprintf(stderr, "Get speed lanes capability failed: %d\n", ret);
> > + return;
> > + }
> > +
> > + num = (unsigned int)ret;
> > + speed_lanes_capa = calloc(num, sizeof(*speed_lanes_capa));
> > + if (speed_lanes_capa == NULL) {
> > + fprintf(stderr, "Failed to alloc speed lanes capability buffer\n");
> > + return;
> > + }
> > +
> > + ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, speed_lanes_capa, num);
> > + if (ret < 0) {
> > + fprintf(stderr, "Error getting speed lanes capability: %d\n", ret);
> > + goto out;
> > + }
> > +
> > + show_speed_lanes_capability(num, speed_lanes_capa);
> > +out:
> > + free(speed_lanes_capa);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_show =
> > + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > + cmd_show, "show");
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_port =
> > + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > + cmd_port, "port");
> > +static cmdline_parse_token_num_t cmd_show_speed_lanes_pid =
> > + TOKEN_NUM_INITIALIZER(struct cmd_show_speed_lanes_result,
> > + cmd_pid, RTE_UINT16);
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_keyword =
> > + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > + cmd_keyword, "speed_lanes");
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_cap_keyword =
> > + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > + cmd_keyword, "capabilities");
> > +
> > +static cmdline_parse_inst_t cmd_show_speed_lanes = {
> > + .f = cmd_show_speed_lanes_parsed,
> > + .data = NULL,
> > + .help_str = "show port <port_id> speed_lanes capabilities",
> > + .tokens = {
> > + (void *)&cmd_show_speed_lanes_show,
> > + (void *)&cmd_show_speed_lanes_port,
> > + (void *)&cmd_show_speed_lanes_pid,
> > + (void *)&cmd_show_speed_lanes_keyword,
> > + (void *)&cmd_show_speed_lanes_cap_keyword,
> > + NULL,
> > + },
> > +};
> > +
> > /* *** configure loopback for all ports *** */
> > struct cmd_config_loopback_all {
> > cmdline_fixed_string_t port;
> > @@ -1676,6 +1786,123 @@ static cmdline_parse_inst_t cmd_config_loopback_specific = {
> > },
> > };
> >
> > +/* *** configure speed_lanes for all ports *** */
> > +struct cmd_config_speed_lanes_all {
> > + cmdline_fixed_string_t port;
> > + cmdline_fixed_string_t keyword;
> > + cmdline_fixed_string_t all;
> > + cmdline_fixed_string_t item;
> > + uint32_t lanes;
> > +};
> > +
> > +static void
> > +cmd_config_speed_lanes_all_parsed(void *parsed_result,
> > + __rte_unused struct cmdline *cl,
> > + __rte_unused void *data)
> > +{
> > + struct cmd_config_speed_lanes_all *res = parsed_result;
> > + portid_t pid;
> > +
> > + if (!all_ports_stopped()) {
> > + fprintf(stderr, "Please stop all ports first\n");
> > + return;
> > + }
> > +
> > + RTE_ETH_FOREACH_DEV(pid) {
> > + if (parse_speed_lanes_cfg(pid, res->lanes))
> > + return;
> > + }
> > +
> > + cmd_reconfig_device_queue(RTE_PORT_ALL, 1, 1);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_port =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, port, "port");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_keyword =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, keyword,
> > + "config");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_all =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, all, "all");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_item =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, item,
> > + "speed_lanes");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_all_lanes =
> > + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_all, lanes, RTE_UINT32);
> > +
> > +static cmdline_parse_inst_t cmd_config_speed_lanes_all = {
> > + .f = cmd_config_speed_lanes_all_parsed,
> > + .data = NULL,
> > + .help_str = "port config all speed_lanes <value>",
> > + .tokens = {
> > + (void *)&cmd_config_speed_lanes_all_port,
> > + (void *)&cmd_config_speed_lanes_all_keyword,
> > + (void *)&cmd_config_speed_lanes_all_all,
> > + (void *)&cmd_config_speed_lanes_all_item,
> > + (void *)&cmd_config_speed_lanes_all_lanes,
> > + NULL,
> > + },
> > +};
> > +
> > +/* *** configure speed_lanes for specific port *** */
> > +struct cmd_config_speed_lanes_specific {
> > + cmdline_fixed_string_t port;
> > + cmdline_fixed_string_t keyword;
> > + uint16_t port_id;
> > + cmdline_fixed_string_t item;
> > + uint32_t lanes;
> > +};
> > +
> > +static void
> > +cmd_config_speed_lanes_specific_parsed(void *parsed_result,
> > + __rte_unused struct cmdline *cl,
> > + __rte_unused void *data)
> > +{
> > + struct cmd_config_speed_lanes_specific *res = parsed_result;
> > +
> > + if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> > + return;
> > +
> > + if (!port_is_stopped(res->port_id)) {
> > + fprintf(stderr, "Please stop port %u first\n", res->port_id);
> > + return;
> > + }
> >
>
> There is a requirement here, that port needs to be stopped before
> calling the rte_eth_speed_lanes_set(),
> is this requirement documented in the API documentation?
>
>
Speed link mode needs a phy reset, hence port stop is a requirement.
I will update this in the documentation in the next patch.
> > +
> > + if (parse_speed_lanes_cfg(res->port_id, res->lanes))
> > + return;
> > +
> > + cmd_reconfig_device_queue(res->port_id, 1, 1);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_port =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, port,
> > + "port");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_keyword =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, keyword,
> > + "config");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_id =
> > + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, port_id,
> > + RTE_UINT16);
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_item =
> > + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, item,
> > + "speed_lanes");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_lanes =
> > + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, lanes,
> > + RTE_UINT32);
> > +
> > +static cmdline_parse_inst_t cmd_config_speed_lanes_specific = {
> > + .f = cmd_config_speed_lanes_specific_parsed,
> > + .data = NULL,
> > + .help_str = "port config <port_id> speed_lanes <value>",
> > + .tokens = {
> > + (void *)&cmd_config_speed_lanes_specific_port,
> > + (void *)&cmd_config_speed_lanes_specific_keyword,
> > + (void *)&cmd_config_speed_lanes_specific_id,
> > + (void *)&cmd_config_speed_lanes_specific_item,
> > + (void *)&cmd_config_speed_lanes_specific_lanes,
> > + NULL,
> > + },
> > +};
> > +
> > /* *** configure txq/rxq, txd/rxd *** */
> > struct cmd_config_rx_tx {
> > cmdline_fixed_string_t port;
> > @@ -13238,6 +13465,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
> > (cmdline_parse_inst_t *)&cmd_set_port_setup_on,
> > (cmdline_parse_inst_t *)&cmd_config_speed_all,
> > (cmdline_parse_inst_t *)&cmd_config_speed_specific,
> > + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_all,
> > + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_specific,
> > + (cmdline_parse_inst_t *)&cmd_show_speed_lanes,
> > (cmdline_parse_inst_t *)&cmd_config_loopback_all,
> > (cmdline_parse_inst_t *)&cmd_config_loopback_specific,
> > (cmdline_parse_inst_t *)&cmd_config_rx_tx,
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index 66c3a68c1d..498a7db467 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -207,6 +207,32 @@ static const struct {
> > {"gtpu", RTE_ETH_FLOW_GTPU},
> > };
> >
> > +static const struct {
> > + enum rte_eth_speed_lanes lane;
> > + const uint32_t value;
> > +} speed_lane_name[] = {
> > + {
> > + .lane = RTE_ETH_SPEED_LANE_UNKNOWN,
> > + .value = 0,
> > + },
> > + {
> > + .lane = RTE_ETH_SPEED_LANE_1,
> > + .value = 1,
> > + },
> > + {
> > + .lane = RTE_ETH_SPEED_LANE_2,
> > + .value = 2,
> > + },
> > + {
> > + .lane = RTE_ETH_SPEED_LANE_4,
> > + .value = 4,
> > + },
> > + {
> > + .lane = RTE_ETH_SPEED_LANE_8,
> > + .value = 8,
> > + },
> > +};
> > +
> > static void
> > print_ethaddr(const char *name, struct rte_ether_addr *eth_addr)
> > {
> > @@ -786,6 +812,7 @@ port_infos_display(portid_t port_id)
> > char name[RTE_ETH_NAME_MAX_LEN];
> > int ret;
> > char fw_version[ETHDEV_FWVERS_LEN];
> > + uint32_t lanes;
> >
> > if (port_id_is_invalid(port_id, ENABLED_WARN)) {
> > print_valid_ports();
> > @@ -828,6 +855,12 @@ port_infos_display(portid_t port_id)
> >
> > printf("\nLink status: %s\n", (link.link_status) ? ("up") : ("down"));
> > printf("Link speed: %s\n", rte_eth_link_speed_to_str(link.link_speed));
> > + if (rte_eth_speed_lanes_get(port_id, &lanes) == 0) {
> > + if (lanes > 0)
> > + printf("Active Lanes: %d\n", lanes);
> > + else
> > + printf("Active Lanes: %s\n", "Unknown");
> >
>
> What can be the 'else' case?
> As 'lanes' is unsigned, only option is it being zero. Is API allowed to
> return zero as lane number?
>
>
Yes. link is down, but controller supports speed_lanes capability,
then we can show "unknown"
Other cases from brcm spec.
1gb 1Gb link speed < no lane info > (theoretically it can't be zero,
but we need to show what controller provides in the query).
10Gb (NRZ: 10G per lane, 1 lane) link speed
> > + }
> > printf("Link duplex: %s\n", (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
> > ("full-duplex") : ("half-duplex"));
> > printf("Autoneg status: %s\n", (link.link_autoneg == RTE_ETH_LINK_AUTONEG) ?
> > @@ -962,7 +995,7 @@ port_summary_header_display(void)
> >
> > port_number = rte_eth_dev_count_avail();
> > printf("Number of available ports: %i\n", port_number);
> > - printf("%-4s %-17s %-12s %-14s %-8s %s\n", "Port", "MAC Address", "Name",
> > + printf("%-4s %-17s %-12s %-14s %-8s %-8s\n", "Port", "MAC Address", "Name",
> > "Driver", "Status", "Link");
> > }
> >
> > @@ -993,7 +1026,7 @@ port_summary_display(portid_t port_id)
> > if (ret != 0)
> > return;
> >
> > - printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n",
> > + printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n",
> >
>
> Summary updates are irrelevant in the patch, can you please drop them.
>
>
Sure I will.
> > port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name,
> > dev_info.driver_name, (link.link_status) ? ("up") : ("down"),
> > rte_eth_link_speed_to_str(link.link_speed));
> > @@ -7244,3 +7277,35 @@ show_mcast_macs(portid_t port_id)
> > printf(" %s\n", buf);
> > }
> > }
> > +
> > +int
> > +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane)
> > +{
> > + uint8_t i;
> > +
> > + for (i = 0; i < RTE_DIM(speed_lane_name); i++) {
> > + if (speed_lane_name[i].value == lane) {
> > + *speed_lane = lane;
> >
>
> This converts from 8 -> 8, 4 -> 4 ....
>
> Why not completely eliminate this fucntion? See below.
>
Sure, will evaluate and do the needful.
> > + return 0;
> > + }
> > + }
> > + return -1;
> > +}
> > +
> > +void
> > +show_speed_lanes_capability(unsigned int num, struct rte_eth_speed_lanes_capa *speed_lanes_capa)
> > +{
> > + unsigned int i, j;
> > +
> > + printf("\n%-15s %-10s", "Supported-speeds", "Valid-lanes");
> > + printf("\n-----------------------------------\n");
> > + for (i = 0; i < num; i++) {
> > + printf("%-17s ", rte_eth_link_speed_to_str(speed_lanes_capa[i].speed));
> > +
> > + for (j = 0; j < RTE_ETH_SPEED_LANE_MAX; j++) {
> > + if (RTE_ETH_SPEED_LANES_TO_CAPA(j) & speed_lanes_capa[i].capa)
> > + printf("%-2d ", speed_lane_name[j].value);
> > + }
>
> To eliminate both RTE_ETH_SPEED_LANE_MAX & speed_lane_name, what do you
> think about:
>
> capa = speed_lanes_capa[i].capa;
> int s = 0;
> while (capa) {
> if (capa & 0x1)
> printf("%-2d ", 1 << s);
> s++;
> capa = capa >> 1;
> }
>
Am new to the DPDK world.
Followed the FEC driver conventions for consistency.
Will update it as you suggested and it makes sense.
> > + printf("\n");
> > + }
> > +}
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> > index 9facd7f281..fb9ef05cc5 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -1253,6 +1253,10 @@ extern int flow_parse(const char *src, void *result, unsigned int size,
> > struct rte_flow_item **pattern,
> > struct rte_flow_action **actions);
> >
> > +void show_speed_lanes_capability(uint32_t num,
> > + struct rte_eth_speed_lanes_capa *speed_lanes_capa);
> > +int parse_speed_lanes(uint32_t lane, uint32_t *speed_lane);
> > +
> >
>
> These functions only called in 'test-pmd/cmdline.c', what do you think
> move functions to that file and make them static?
>
>
Ack
> > uint64_t str_to_rsstypes(const char *str);
> > const char *rsstypes_to_str(uint64_t rss_type);
> >
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 883e59a927..0f10aec3a1 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -1179,6 +1179,79 @@ typedef int (*eth_rx_descriptor_dump_t)(const struct rte_eth_dev *dev,
> > uint16_t queue_id, uint16_t offset,
> > uint16_t num, FILE *file);
> >
> > +/**
> > + * @internal
> > + * Get number of current active lanes
> > + *
> > + * @param dev
> > + * ethdev handle of port.
> > + * @param speed_lanes
> > + * Number of active lanes that the link is trained up.
> > + * @return
> > + * Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + * Success, get speed_lanes data success.
> > + * @retval -ENOTSUP
> > + * Operation is not supported.
> > + * @retval -EIO
> > + * Device is removed.
> >
>
> Is above '-ENOTSUP' & '-EIO' return values are valid?
> Normally we expect those two from ethdev API, not from dev_ops.
> In which case a dev_ops expected to return these?
>
> Same comment for all three new APIs.
>
>
code snippet from our driver
.speed_lanes_get = bnxt_speed_lanes_get,
The ethdev dev_ops returns -ENOTSUP if capability is not supported. Is this ok?
static int bnxt_speed_lanes_get(struct rte_eth_dev *dev,
uint32_t *speed_lanes)
{
struct bnxt *bp = dev->data->dev_private;
if (!BNXT_LINK_SPEEDS_V2(bp))
return -ENOTSUP;
-EIO - will remove
I will check and update other functions also.
> > + */
> > +typedef int (*eth_speed_lanes_get_t)(struct rte_eth_dev *dev, uint32_t *speed_lanes);
> > +
> > +/**
> > + * @internal
> > + * Set speed lanes
> > + *
> > + * @param dev
> > + * ethdev handle of port.
> > + * @param speed_lanes
> > + * Non-negative number of lanes
> > + *
> > + * @return
> > + * Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + * Success, set lanes success.
> > + * @retval -ENOTSUP
> > + * Operation is not supported.
> > + * @retval -EINVAL
> > + * Unsupported mode requested.
> > + * @retval -EIO
> > + * Device is removed.
> > + */
> > +typedef int (*eth_speed_lanes_set_t)(struct rte_eth_dev *dev, uint32_t speed_lanes);
> > +
> > +/**
> > + * @internal
> > + * Get supported link speed lanes capability
> > + *
> > + * @param speed_lanes_capa
> > + * speed_lanes_capa is out only with per-speed capabilities.
> >
>
> I can understand what above says but I think it can be clarified more,
> what do you think?
>
Ack
> > + * @param num
> > + * a number of elements in an speed_speed_lanes_capa array.
> >
>
> 'a number of elements' or 'number of elements' ?
>
Ack
> > + *
> > + * @return
> > + * Negative errno value on error, positive value on success.
> > + *
> > + * @retval positive value
> > + * A non-negative value lower or equal to num: success. The return value
> > + * is the number of entries filled in the speed lanes array.
> > + * A non-negative value higher than num: error, the given speed lanes capa array
> > + * is too small. The return value corresponds to the num that should
> > + * be given to succeed. The entries in the speed lanes capa array are not valid
> > + * and shall not be used by the caller.
> > + * @retval -ENOTSUP
> > + * Operation is not supported.
> > + * @retval -EIO
> > + * Device is removed.
> > + * @retval -EINVAL
> > + * *num* or *speed_lanes_capa* invalid.
> > + */
> > +typedef int (*eth_speed_lanes_get_capability_t)(struct rte_eth_dev *dev,
> > + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > + unsigned int num);
> > +
> >
>
> These new dev_ops placed just in between existing dev_ops
> 'eth_rx_descriptor_dump_t' and 'eth_tx_descriptor_dump_t',
> if you were looking this header file as whole, what would you think
> about quality of it?
>
> Please group new dev_ops below link related ones.
>
>
Ack
> > /**
> > * @internal
> > * Dump Tx descriptor info to a file.
> > @@ -1247,6 +1320,10 @@ struct eth_dev_ops {
> > eth_dev_close_t dev_close; /**< Close device */
> > eth_dev_reset_t dev_reset; /**< Reset device */
> > eth_link_update_t link_update; /**< Get device link state */
> > + eth_speed_lanes_get_t speed_lanes_get; /**<Get link speed active lanes */
> > + eth_speed_lanes_set_t speed_lanes_set; /**<set the link speeds supported lanes */
> > + /** Get link speed lanes capability */
> > + eth_speed_lanes_get_capability_t speed_lanes_get_capa;
> > /** Check if the device was physically removed */
> > eth_is_removed_t is_removed;
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> > index f1c658f49e..07cefea307 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -7008,4 +7008,55 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
> > return ret;
> > }
> >
> > +int
> > +rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lane)
> > +{
> > + struct rte_eth_dev *dev;
> > +
> > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > + dev = &rte_eth_devices[port_id];
> > +
> > + if (*dev->dev_ops->speed_lanes_get == NULL)
> > + return -ENOTSUP;
> > + return eth_err(port_id, (*dev->dev_ops->speed_lanes_get)(dev, lane));
> > +}
> > +
> > +int
> > +rte_eth_speed_lanes_get_capability(uint16_t port_id,
> > + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > + unsigned int num)
> > +{
> > + struct rte_eth_dev *dev;
> > + int ret;
> > +
> > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > + dev = &rte_eth_devices[port_id];
> > +
> > + if (speed_lanes_capa == NULL && num > 0) {
> > + RTE_ETHDEV_LOG_LINE(ERR,
> > + "Cannot get ethdev port %u speed lanes capability to NULL when array size is non zero",
> > + port_id);
> > + return -EINVAL;
> > + }
> >
>
> According above check, "speed_lanes_capa == NULL && num == 0" is a valid
> input, I assume this is useful to get expected size of the
> 'speed_lanes_capa' array, but this is not mentioned in the API
> documentation, can you please update API doxygen comment to cover this case.
>
>
Ack
> > +
> > + if (*dev->dev_ops->speed_lanes_get_capa == NULL)
> > + return -ENOTSUP;
> >
>
> About the order or the checks, should we first check if the dev_ops
> exist than validating the input arguments?
> If dev_ops is not available, input variables doesn't matter anyway.
>
Ack
> > + ret = (*dev->dev_ops->speed_lanes_get_capa)(dev, speed_lanes_capa, num);
> > +
> > + return ret;
> >
>
> API returns -EIO only if it is returned with 'eth_err()', that is to
> cover the hot remove case. It is missing in this function.
>
>
Ack
> > +}
> > +
> > +int
> > +rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes_capa)
> > +{
> > + struct rte_eth_dev *dev;
> > +
> > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > + dev = &rte_eth_devices[port_id];
> > +
> > + if (*dev->dev_ops->speed_lanes_set == NULL)
> > + return -ENOTSUP;
> > + return eth_err(port_id, (*dev->dev_ops->speed_lanes_set)(dev, speed_lanes_capa));
> > +}
> >
>
> Simiar location comment with the header one, instead of adding new APIs
> to the very bottom of the file, can you please group them just below the
> link related APIs?
>
Ack
> > +
> > RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 548fada1c7..35d0b81452 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -357,6 +357,30 @@ struct rte_eth_link {
> > #define RTE_ETH_LINK_MAX_STR_LEN 40 /**< Max length of default link string. */
> > /**@}*/
> >
> > +/**
> > + * This enum indicates the possible link speed lanes of an ethdev port.
> > + */
> > +enum rte_eth_speed_lanes {
> > + RTE_ETH_SPEED_LANE_UNKNOWN = 0, /**< speed lanes unsupported mode or default */
> > + RTE_ETH_SPEED_LANE_1 = 1, /**< Link speed lane 1 */
> > + RTE_ETH_SPEED_LANE_2 = 2, /**< Link speed lanes 2 */
> > + RTE_ETH_SPEED_LANE_4 = 4, /**< Link speed lanes 4 */
> > + RTE_ETH_SPEED_LANE_8 = 8, /**< Link speed lanes 8 */
> >
>
> Do we really need enum for the lane number? Why not use it as just number?
> As far as I can see APIs get "uint32 lanes" parameter anyway.
>
>
Ack
> > + RTE_ETH_SPEED_LANE_MAX,
> >
>
Will take care in the upcoming new patch
> This kind of MAX enum usage is causing trouble when we want to extend
> the support in the future.
> Like when 16 lane is required, adding it changes the value of MAX and as
> this is a public structure, change is causing ABI break, making us wait
> until next ABI break realease.
> So better if we can prevent MAX enum usage.
>
Make sense. Ack
> > +};
> > +
> > +/* Translate from link speed lanes to speed lanes capa */
> > +#define RTE_ETH_SPEED_LANES_TO_CAPA(x) RTE_BIT32(x)
> > +
> > +/* This macro indicates link speed lanes capa mask */
> > +#define RTE_ETH_SPEED_LANES_CAPA_MASK(x) RTE_BIT32(RTE_ETH_SPEED_ ## x)
> >
>
> Why is above macro needed?
>
>
To use in parse_speed_lanes to validate user input. It's not used any
more in new patches. will remove it.
> > +
> > +/* A structure used to get and set lanes capabilities per link speed */
> > +struct rte_eth_speed_lanes_capa {
> > + uint32_t speed;
> > + uint32_t capa;
> > +};
> > +
> > /**
> > * A structure used to configure the ring threshold registers of an Rx/Tx
> > * queue for an Ethernet port.
> > @@ -6922,6 +6946,74 @@ rte_eth_tx_queue_count(uint16_t port_id, uint16_t queue_id)
> > return rc;
> > }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Get Active lanes.
> > + *
> > + * @param port_id
> > + * The port identifier of the Ethernet device.
> > + * @param lanes
> > + * driver populates a active lanes value whether link is Autonegotiated or Fixed speed.
> >
>
> As these doxygen comments are API docummentation, can you please form
> them as proper sentences, like start with uppercase, end with '.', etc...
> Same comment for all APIs.
>
Ack
> > + *
> > + * @return
> > + * - (0) if successful.
> > + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + * that operation.
> > + * - (-EIO) if device is removed.
> > + * - (-ENODEV) if *port_id* invalid.
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lanes);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Set speed lanes supported by the NIC.
> > + *
> > + * @param port_id
> > + * The port identifier of the Ethernet device.
> > + * @param speed_lanes
> > + * speed_lanes a non-zero value of number lanes for this speeds.
> >
>
> 'this speeds' ?
>
>
Ack. "number of lanes for current speed"
> > + *
> > + * @return
> > + * - (0) if successful.
> > + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + * that operation.
> > + * - (-EIO) if device is removed.
> > + * - (-ENODEV) if *port_id* invalid.
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Get speed lanes supported by the NIC.
> > + *
> > + * @param port_id
> > + * The port identifier of the Ethernet device.
> > + * @param speed_lanes_capa
> > + * speed_lanes_capa int array with valid lanes per speed.
> > + * @param num
> > + * size of the speed_lanes_capa array.
> > + *
> > + * @return
> > + * - (0) if successful.
> > + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + * that operation.
> > + * - (-EIO) if device is removed.
> > + * - (-ENODEV) if *port_id* invalid.
> > + * - (-EINVAL) if *speed_lanes* invalid
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_get_capability(uint16_t port_id,
> > + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > + unsigned int num);
> > +
> >
>
> The bottom of the header file is for static inline functions.
> Instead of adding these new APIs at the very bottom of the header, can
> you please group them just below the link speed related APIs?
>
>
Ack
> > #ifdef __cplusplus
> > }
> > #endif
> > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > index 79f6f5293b..db9261946f 100644
> > --- a/lib/ethdev/version.map
> > +++ b/lib/ethdev/version.map
> > @@ -325,6 +325,11 @@ EXPERIMENTAL {
> > rte_flow_template_table_resizable;
> > rte_flow_template_table_resize;
> > rte_flow_template_table_resize_complete;
> > +
> > + # added in 24.07
> > + rte_eth_speed_lanes_get;
> > + rte_eth_speed_lanes_get_capability;
> > + rte_eth_speed_lanes_set;
> > };
> >
> > INTERNAL {
>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
^ permalink raw reply [relevance 0%]
* Re: [PATCH v4] ethdev: Add link_speed lanes support
@ 2024-07-09 11:10 4% ` Ferruh Yigit
2024-07-09 21:20 0% ` Damodharam Ammepalli
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-07-09 11:10 UTC (permalink / raw)
To: Damodharam Ammepalli
Cc: ajit.khaparde, dev, huangdengdui, kalesh-anakkur.purayil
On 7/9/2024 12:22 AM, Damodharam Ammepalli wrote:
> Update the eth_dev_ops structure with new function vectors
> to get, get capabilities and set ethernet link speed lanes.
> Update the testpmd to provide required config and information
> display infrastructure.
>
> The supporting ethernet controller driver will register callbacks
> to avail link speed lanes config and get services. This lanes
> configuration is applicable only when the nic is forced to fixed
> speeds. In Autonegiation mode, the hardware automatically
> negotiates the number of lanes.
>
> These are the new commands.
>
> testpmd> show port 0 speed_lanes capabilities
>
> Supported speeds Valid lanes
> -----------------------------------
> 10 Gbps 1
> 25 Gbps 1
> 40 Gbps 4
> 50 Gbps 1 2
> 100 Gbps 1 2 4
> 200 Gbps 2 4
> 400 Gbps 4 8
> testpmd>
>
> testpmd>
> testpmd> port stop 0
> testpmd> port config 0 speed_lanes 4
> testpmd> port config 0 speed 200000 duplex full
>
Is there a requirement to set speed before speed_lane?
Because I expect driver will verify if a speed_lane value is valid or
not for a specific speed value. In above usage, driver will verify based
on existing speed, whatever it is, later chaning speed may cause invalid
speed_lane configuration.
> testpmd> port start 0
> testpmd>
> testpmd> show port info 0
>
> ********************* Infos for port 0 *********************
> MAC address: 14:23:F2:C3:BA:D2
> Device name: 0000:b1:00.0
> Driver name: net_bnxt
> Firmware-version: 228.9.115.0
> Connect to socket: 2
> memory allocation on the socket: 2
> Link status: up
> Link speed: 200 Gbps
> Active Lanes: 4
> Link duplex: full-duplex
> Autoneg status: Off
>
> Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
> ---
> v2->v3 Consolidating the testpmd and rtelib patches into a single patch
> as requested.
> v3->v4 Addressed comments and fix help string and documentation.
>
> app/test-pmd/cmdline.c | 230 +++++++++++++++++++++++++++++++++++++
> app/test-pmd/config.c | 69 ++++++++++-
> app/test-pmd/testpmd.h | 4 +
> lib/ethdev/ethdev_driver.h | 77 +++++++++++++
> lib/ethdev/rte_ethdev.c | 51 ++++++++
> lib/ethdev/rte_ethdev.h | 92 +++++++++++++++
> lib/ethdev/version.map | 5 +
> 7 files changed, 526 insertions(+), 2 deletions(-)
>
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index b7759e38a8..a507df31d8 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -284,6 +284,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>
> "dump_log_types\n"
> " Dumps the log level for all the dpdk modules\n\n"
> +
> + "show port (port_id) speed_lanes capabilities"
> + " Show speed lanes capabilities of a port.\n\n"
> );
> }
>
> @@ -823,6 +826,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> "port config (port_id) txq (queue_id) affinity (value)\n"
> " Map a Tx queue with an aggregated port "
> "of the DPDK port\n\n"
> +
> + "port config (port_id|all) speed_lanes (0|1|4|8)\n"
> + " Set number of lanes for all ports or port_id for a forced speed\n\n"
> );
> }
>
> @@ -1560,6 +1566,110 @@ static cmdline_parse_inst_t cmd_config_speed_specific = {
> },
> };
>
> +static int
> +parse_speed_lanes_cfg(portid_t pid, uint32_t lanes)
> +{
> + int ret;
> + uint32_t lanes_capa;
> +
> + ret = parse_speed_lanes(lanes, &lanes_capa);
> + if (ret < 0) {
> + fprintf(stderr, "Unknown speed lane value: %d for port %d\n", lanes, pid);
> + return -1;
> + }
> +
> + ret = rte_eth_speed_lanes_set(pid, lanes_capa);
> + if (ret == -ENOTSUP) {
> + fprintf(stderr, "Function not implemented\n");
> + return -1;
> + } else if (ret < 0) {
> + fprintf(stderr, "Set speed lanes failed\n");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +/* *** display speed lanes per port capabilities *** */
> +struct cmd_show_speed_lanes_result {
> + cmdline_fixed_string_t cmd_show;
> + cmdline_fixed_string_t cmd_port;
> + cmdline_fixed_string_t cmd_keyword;
> + portid_t cmd_pid;
> +};
> +
> +static void
> +cmd_show_speed_lanes_parsed(void *parsed_result,
> + __rte_unused struct cmdline *cl,
> + __rte_unused void *data)
> +{
> + struct cmd_show_speed_lanes_result *res = parsed_result;
> + struct rte_eth_speed_lanes_capa *speed_lanes_capa;
> + unsigned int num;
> + int ret;
> +
> + if (!rte_eth_dev_is_valid_port(res->cmd_pid)) {
> + fprintf(stderr, "Invalid port id %u\n", res->cmd_pid);
> + return;
> + }
> +
> + ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, NULL, 0);
> + if (ret == -ENOTSUP) {
> + fprintf(stderr, "Function not implemented\n");
> + return;
> + } else if (ret < 0) {
> + fprintf(stderr, "Get speed lanes capability failed: %d\n", ret);
> + return;
> + }
> +
> + num = (unsigned int)ret;
> + speed_lanes_capa = calloc(num, sizeof(*speed_lanes_capa));
> + if (speed_lanes_capa == NULL) {
> + fprintf(stderr, "Failed to alloc speed lanes capability buffer\n");
> + return;
> + }
> +
> + ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, speed_lanes_capa, num);
> + if (ret < 0) {
> + fprintf(stderr, "Error getting speed lanes capability: %d\n", ret);
> + goto out;
> + }
> +
> + show_speed_lanes_capability(num, speed_lanes_capa);
> +out:
> + free(speed_lanes_capa);
> +}
> +
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_show =
> + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> + cmd_show, "show");
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_port =
> + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> + cmd_port, "port");
> +static cmdline_parse_token_num_t cmd_show_speed_lanes_pid =
> + TOKEN_NUM_INITIALIZER(struct cmd_show_speed_lanes_result,
> + cmd_pid, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_keyword =
> + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> + cmd_keyword, "speed_lanes");
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_cap_keyword =
> + TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> + cmd_keyword, "capabilities");
> +
> +static cmdline_parse_inst_t cmd_show_speed_lanes = {
> + .f = cmd_show_speed_lanes_parsed,
> + .data = NULL,
> + .help_str = "show port <port_id> speed_lanes capabilities",
> + .tokens = {
> + (void *)&cmd_show_speed_lanes_show,
> + (void *)&cmd_show_speed_lanes_port,
> + (void *)&cmd_show_speed_lanes_pid,
> + (void *)&cmd_show_speed_lanes_keyword,
> + (void *)&cmd_show_speed_lanes_cap_keyword,
> + NULL,
> + },
> +};
> +
> /* *** configure loopback for all ports *** */
> struct cmd_config_loopback_all {
> cmdline_fixed_string_t port;
> @@ -1676,6 +1786,123 @@ static cmdline_parse_inst_t cmd_config_loopback_specific = {
> },
> };
>
> +/* *** configure speed_lanes for all ports *** */
> +struct cmd_config_speed_lanes_all {
> + cmdline_fixed_string_t port;
> + cmdline_fixed_string_t keyword;
> + cmdline_fixed_string_t all;
> + cmdline_fixed_string_t item;
> + uint32_t lanes;
> +};
> +
> +static void
> +cmd_config_speed_lanes_all_parsed(void *parsed_result,
> + __rte_unused struct cmdline *cl,
> + __rte_unused void *data)
> +{
> + struct cmd_config_speed_lanes_all *res = parsed_result;
> + portid_t pid;
> +
> + if (!all_ports_stopped()) {
> + fprintf(stderr, "Please stop all ports first\n");
> + return;
> + }
> +
> + RTE_ETH_FOREACH_DEV(pid) {
> + if (parse_speed_lanes_cfg(pid, res->lanes))
> + return;
> + }
> +
> + cmd_reconfig_device_queue(RTE_PORT_ALL, 1, 1);
> +}
> +
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_port =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, port, "port");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_keyword =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, keyword,
> + "config");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_all =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, all, "all");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_item =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, item,
> + "speed_lanes");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_all_lanes =
> + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_all, lanes, RTE_UINT32);
> +
> +static cmdline_parse_inst_t cmd_config_speed_lanes_all = {
> + .f = cmd_config_speed_lanes_all_parsed,
> + .data = NULL,
> + .help_str = "port config all speed_lanes <value>",
> + .tokens = {
> + (void *)&cmd_config_speed_lanes_all_port,
> + (void *)&cmd_config_speed_lanes_all_keyword,
> + (void *)&cmd_config_speed_lanes_all_all,
> + (void *)&cmd_config_speed_lanes_all_item,
> + (void *)&cmd_config_speed_lanes_all_lanes,
> + NULL,
> + },
> +};
> +
> +/* *** configure speed_lanes for specific port *** */
> +struct cmd_config_speed_lanes_specific {
> + cmdline_fixed_string_t port;
> + cmdline_fixed_string_t keyword;
> + uint16_t port_id;
> + cmdline_fixed_string_t item;
> + uint32_t lanes;
> +};
> +
> +static void
> +cmd_config_speed_lanes_specific_parsed(void *parsed_result,
> + __rte_unused struct cmdline *cl,
> + __rte_unused void *data)
> +{
> + struct cmd_config_speed_lanes_specific *res = parsed_result;
> +
> + if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> + return;
> +
> + if (!port_is_stopped(res->port_id)) {
> + fprintf(stderr, "Please stop port %u first\n", res->port_id);
> + return;
> + }
>
There is a requirement here, that port needs to be stopped before
calling the rte_eth_speed_lanes_set(),
is this requirement documented in the API documentation?
> +
> + if (parse_speed_lanes_cfg(res->port_id, res->lanes))
> + return;
> +
> + cmd_reconfig_device_queue(res->port_id, 1, 1);
> +}
> +
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_port =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, port,
> + "port");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_keyword =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, keyword,
> + "config");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_id =
> + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, port_id,
> + RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_item =
> + TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, item,
> + "speed_lanes");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_lanes =
> + TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, lanes,
> + RTE_UINT32);
> +
> +static cmdline_parse_inst_t cmd_config_speed_lanes_specific = {
> + .f = cmd_config_speed_lanes_specific_parsed,
> + .data = NULL,
> + .help_str = "port config <port_id> speed_lanes <value>",
> + .tokens = {
> + (void *)&cmd_config_speed_lanes_specific_port,
> + (void *)&cmd_config_speed_lanes_specific_keyword,
> + (void *)&cmd_config_speed_lanes_specific_id,
> + (void *)&cmd_config_speed_lanes_specific_item,
> + (void *)&cmd_config_speed_lanes_specific_lanes,
> + NULL,
> + },
> +};
> +
> /* *** configure txq/rxq, txd/rxd *** */
> struct cmd_config_rx_tx {
> cmdline_fixed_string_t port;
> @@ -13238,6 +13465,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
> (cmdline_parse_inst_t *)&cmd_set_port_setup_on,
> (cmdline_parse_inst_t *)&cmd_config_speed_all,
> (cmdline_parse_inst_t *)&cmd_config_speed_specific,
> + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_all,
> + (cmdline_parse_inst_t *)&cmd_config_speed_lanes_specific,
> + (cmdline_parse_inst_t *)&cmd_show_speed_lanes,
> (cmdline_parse_inst_t *)&cmd_config_loopback_all,
> (cmdline_parse_inst_t *)&cmd_config_loopback_specific,
> (cmdline_parse_inst_t *)&cmd_config_rx_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 66c3a68c1d..498a7db467 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -207,6 +207,32 @@ static const struct {
> {"gtpu", RTE_ETH_FLOW_GTPU},
> };
>
> +static const struct {
> + enum rte_eth_speed_lanes lane;
> + const uint32_t value;
> +} speed_lane_name[] = {
> + {
> + .lane = RTE_ETH_SPEED_LANE_UNKNOWN,
> + .value = 0,
> + },
> + {
> + .lane = RTE_ETH_SPEED_LANE_1,
> + .value = 1,
> + },
> + {
> + .lane = RTE_ETH_SPEED_LANE_2,
> + .value = 2,
> + },
> + {
> + .lane = RTE_ETH_SPEED_LANE_4,
> + .value = 4,
> + },
> + {
> + .lane = RTE_ETH_SPEED_LANE_8,
> + .value = 8,
> + },
> +};
> +
> static void
> print_ethaddr(const char *name, struct rte_ether_addr *eth_addr)
> {
> @@ -786,6 +812,7 @@ port_infos_display(portid_t port_id)
> char name[RTE_ETH_NAME_MAX_LEN];
> int ret;
> char fw_version[ETHDEV_FWVERS_LEN];
> + uint32_t lanes;
>
> if (port_id_is_invalid(port_id, ENABLED_WARN)) {
> print_valid_ports();
> @@ -828,6 +855,12 @@ port_infos_display(portid_t port_id)
>
> printf("\nLink status: %s\n", (link.link_status) ? ("up") : ("down"));
> printf("Link speed: %s\n", rte_eth_link_speed_to_str(link.link_speed));
> + if (rte_eth_speed_lanes_get(port_id, &lanes) == 0) {
> + if (lanes > 0)
> + printf("Active Lanes: %d\n", lanes);
> + else
> + printf("Active Lanes: %s\n", "Unknown");
>
What can be the 'else' case?
As 'lanes' is unsigned, only option is it being zero. Is API allowed to
return zero as lane number?
> + }
> printf("Link duplex: %s\n", (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
> ("full-duplex") : ("half-duplex"));
> printf("Autoneg status: %s\n", (link.link_autoneg == RTE_ETH_LINK_AUTONEG) ?
> @@ -962,7 +995,7 @@ port_summary_header_display(void)
>
> port_number = rte_eth_dev_count_avail();
> printf("Number of available ports: %i\n", port_number);
> - printf("%-4s %-17s %-12s %-14s %-8s %s\n", "Port", "MAC Address", "Name",
> + printf("%-4s %-17s %-12s %-14s %-8s %-8s\n", "Port", "MAC Address", "Name",
> "Driver", "Status", "Link");
> }
>
> @@ -993,7 +1026,7 @@ port_summary_display(portid_t port_id)
> if (ret != 0)
> return;
>
> - printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n",
> + printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n",
>
Summary updates are irrelevant in the patch, can you please drop them.
> port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name,
> dev_info.driver_name, (link.link_status) ? ("up") : ("down"),
> rte_eth_link_speed_to_str(link.link_speed));
> @@ -7244,3 +7277,35 @@ show_mcast_macs(portid_t port_id)
> printf(" %s\n", buf);
> }
> }
> +
> +int
> +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane)
> +{
> + uint8_t i;
> +
> + for (i = 0; i < RTE_DIM(speed_lane_name); i++) {
> + if (speed_lane_name[i].value == lane) {
> + *speed_lane = lane;
>
This converts from 8 -> 8, 4 -> 4 ....
Why not completely eliminate this fucntion? See below.
> + return 0;
> + }
> + }
> + return -1;
> +}
> +
> +void
> +show_speed_lanes_capability(unsigned int num, struct rte_eth_speed_lanes_capa *speed_lanes_capa)
> +{
> + unsigned int i, j;
> +
> + printf("\n%-15s %-10s", "Supported-speeds", "Valid-lanes");
> + printf("\n-----------------------------------\n");
> + for (i = 0; i < num; i++) {
> + printf("%-17s ", rte_eth_link_speed_to_str(speed_lanes_capa[i].speed));
> +
> + for (j = 0; j < RTE_ETH_SPEED_LANE_MAX; j++) {
> + if (RTE_ETH_SPEED_LANES_TO_CAPA(j) & speed_lanes_capa[i].capa)
> + printf("%-2d ", speed_lane_name[j].value);
> + }
To eliminate both RTE_ETH_SPEED_LANE_MAX & speed_lane_name, what do you
think about:
capa = speed_lanes_capa[i].capa;
int s = 0;
while (capa) {
if (capa & 0x1)
printf("%-2d ", 1 << s);
s++;
capa = capa >> 1;
}
> + printf("\n");
> + }
> +}
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 9facd7f281..fb9ef05cc5 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -1253,6 +1253,10 @@ extern int flow_parse(const char *src, void *result, unsigned int size,
> struct rte_flow_item **pattern,
> struct rte_flow_action **actions);
>
> +void show_speed_lanes_capability(uint32_t num,
> + struct rte_eth_speed_lanes_capa *speed_lanes_capa);
> +int parse_speed_lanes(uint32_t lane, uint32_t *speed_lane);
> +
>
These functions only called in 'test-pmd/cmdline.c', what do you think
move functions to that file and make them static?
> uint64_t str_to_rsstypes(const char *str);
> const char *rsstypes_to_str(uint64_t rss_type);
>
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 883e59a927..0f10aec3a1 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1179,6 +1179,79 @@ typedef int (*eth_rx_descriptor_dump_t)(const struct rte_eth_dev *dev,
> uint16_t queue_id, uint16_t offset,
> uint16_t num, FILE *file);
>
> +/**
> + * @internal
> + * Get number of current active lanes
> + *
> + * @param dev
> + * ethdev handle of port.
> + * @param speed_lanes
> + * Number of active lanes that the link is trained up.
> + * @return
> + * Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + * Success, get speed_lanes data success.
> + * @retval -ENOTSUP
> + * Operation is not supported.
> + * @retval -EIO
> + * Device is removed.
>
Is above '-ENOTSUP' & '-EIO' return values are valid?
Normally we expect those two from ethdev API, not from dev_ops.
In which case a dev_ops expected to return these?
Same comment for all three new APIs.
> + */
> +typedef int (*eth_speed_lanes_get_t)(struct rte_eth_dev *dev, uint32_t *speed_lanes);
> +
> +/**
> + * @internal
> + * Set speed lanes
> + *
> + * @param dev
> + * ethdev handle of port.
> + * @param speed_lanes
> + * Non-negative number of lanes
> + *
> + * @return
> + * Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + * Success, set lanes success.
> + * @retval -ENOTSUP
> + * Operation is not supported.
> + * @retval -EINVAL
> + * Unsupported mode requested.
> + * @retval -EIO
> + * Device is removed.
> + */
> +typedef int (*eth_speed_lanes_set_t)(struct rte_eth_dev *dev, uint32_t speed_lanes);
> +
> +/**
> + * @internal
> + * Get supported link speed lanes capability
> + *
> + * @param speed_lanes_capa
> + * speed_lanes_capa is out only with per-speed capabilities.
>
I can understand what above says but I think it can be clarified more,
what do you think?
> + * @param num
> + * a number of elements in an speed_speed_lanes_capa array.
>
'a number of elements' or 'number of elements' ?
> + *
> + * @return
> + * Negative errno value on error, positive value on success.
> + *
> + * @retval positive value
> + * A non-negative value lower or equal to num: success. The return value
> + * is the number of entries filled in the speed lanes array.
> + * A non-negative value higher than num: error, the given speed lanes capa array
> + * is too small. The return value corresponds to the num that should
> + * be given to succeed. The entries in the speed lanes capa array are not valid
> + * and shall not be used by the caller.
> + * @retval -ENOTSUP
> + * Operation is not supported.
> + * @retval -EIO
> + * Device is removed.
> + * @retval -EINVAL
> + * *num* or *speed_lanes_capa* invalid.
> + */
> +typedef int (*eth_speed_lanes_get_capability_t)(struct rte_eth_dev *dev,
> + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> + unsigned int num);
> +
>
These new dev_ops placed just in between existing dev_ops
'eth_rx_descriptor_dump_t' and 'eth_tx_descriptor_dump_t',
if you were looking this header file as whole, what would you think
about quality of it?
Please group new dev_ops below link related ones.
> /**
> * @internal
> * Dump Tx descriptor info to a file.
> @@ -1247,6 +1320,10 @@ struct eth_dev_ops {
> eth_dev_close_t dev_close; /**< Close device */
> eth_dev_reset_t dev_reset; /**< Reset device */
> eth_link_update_t link_update; /**< Get device link state */
> + eth_speed_lanes_get_t speed_lanes_get; /**<Get link speed active lanes */
> + eth_speed_lanes_set_t speed_lanes_set; /**<set the link speeds supported lanes */
> + /** Get link speed lanes capability */
> + eth_speed_lanes_get_capability_t speed_lanes_get_capa;
> /** Check if the device was physically removed */
> eth_is_removed_t is_removed;
>
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index f1c658f49e..07cefea307 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -7008,4 +7008,55 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
> return ret;
> }
>
> +int
> +rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lane)
> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> + dev = &rte_eth_devices[port_id];
> +
> + if (*dev->dev_ops->speed_lanes_get == NULL)
> + return -ENOTSUP;
> + return eth_err(port_id, (*dev->dev_ops->speed_lanes_get)(dev, lane));
> +}
> +
> +int
> +rte_eth_speed_lanes_get_capability(uint16_t port_id,
> + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> + unsigned int num)
> +{
> + struct rte_eth_dev *dev;
> + int ret;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> + dev = &rte_eth_devices[port_id];
> +
> + if (speed_lanes_capa == NULL && num > 0) {
> + RTE_ETHDEV_LOG_LINE(ERR,
> + "Cannot get ethdev port %u speed lanes capability to NULL when array size is non zero",
> + port_id);
> + return -EINVAL;
> + }
>
According above check, "speed_lanes_capa == NULL && num == 0" is a valid
input, I assume this is useful to get expected size of the
'speed_lanes_capa' array, but this is not mentioned in the API
documentation, can you please update API doxygen comment to cover this case.
> +
> + if (*dev->dev_ops->speed_lanes_get_capa == NULL)
> + return -ENOTSUP;
>
About the order or the checks, should we first check if the dev_ops
exist than validating the input arguments?
If dev_ops is not available, input variables doesn't matter anyway.
> + ret = (*dev->dev_ops->speed_lanes_get_capa)(dev, speed_lanes_capa, num);
> +
> + return ret;
>
API returns -EIO only if it is returned with 'eth_err()', that is to
cover the hot remove case. It is missing in this function.
> +}
> +
> +int
> +rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes_capa)
> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> + dev = &rte_eth_devices[port_id];
> +
> + if (*dev->dev_ops->speed_lanes_set == NULL)
> + return -ENOTSUP;
> + return eth_err(port_id, (*dev->dev_ops->speed_lanes_set)(dev, speed_lanes_capa));
> +}
>
Simiar location comment with the header one, instead of adding new APIs
to the very bottom of the file, can you please group them just below the
link related APIs?
> +
> RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 548fada1c7..35d0b81452 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -357,6 +357,30 @@ struct rte_eth_link {
> #define RTE_ETH_LINK_MAX_STR_LEN 40 /**< Max length of default link string. */
> /**@}*/
>
> +/**
> + * This enum indicates the possible link speed lanes of an ethdev port.
> + */
> +enum rte_eth_speed_lanes {
> + RTE_ETH_SPEED_LANE_UNKNOWN = 0, /**< speed lanes unsupported mode or default */
> + RTE_ETH_SPEED_LANE_1 = 1, /**< Link speed lane 1 */
> + RTE_ETH_SPEED_LANE_2 = 2, /**< Link speed lanes 2 */
> + RTE_ETH_SPEED_LANE_4 = 4, /**< Link speed lanes 4 */
> + RTE_ETH_SPEED_LANE_8 = 8, /**< Link speed lanes 8 */
>
Do we really need enum for the lane number? Why not use it as just number?
As far as I can see APIs get "uint32 lanes" parameter anyway.
> + RTE_ETH_SPEED_LANE_MAX,
>
This kind of MAX enum usage is causing trouble when we want to extend
the support in the future.
Like when 16 lane is required, adding it changes the value of MAX and as
this is a public structure, change is causing ABI break, making us wait
until next ABI break realease.
So better if we can prevent MAX enum usage.
> +};
> +
> +/* Translate from link speed lanes to speed lanes capa */
> +#define RTE_ETH_SPEED_LANES_TO_CAPA(x) RTE_BIT32(x)
> +
> +/* This macro indicates link speed lanes capa mask */
> +#define RTE_ETH_SPEED_LANES_CAPA_MASK(x) RTE_BIT32(RTE_ETH_SPEED_ ## x)
>
Why is above macro needed?
> +
> +/* A structure used to get and set lanes capabilities per link speed */
> +struct rte_eth_speed_lanes_capa {
> + uint32_t speed;
> + uint32_t capa;
> +};
> +
> /**
> * A structure used to configure the ring threshold registers of an Rx/Tx
> * queue for an Ethernet port.
> @@ -6922,6 +6946,74 @@ rte_eth_tx_queue_count(uint16_t port_id, uint16_t queue_id)
> return rc;
> }
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Get Active lanes.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param lanes
> + * driver populates a active lanes value whether link is Autonegotiated or Fixed speed.
>
As these doxygen comments are API docummentation, can you please form
them as proper sentences, like start with uppercase, end with '.', etc...
Same comment for all APIs.
> + *
> + * @return
> + * - (0) if successful.
> + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + * that operation.
> + * - (-EIO) if device is removed.
> + * - (-ENODEV) if *port_id* invalid.
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lanes);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Set speed lanes supported by the NIC.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param speed_lanes
> + * speed_lanes a non-zero value of number lanes for this speeds.
>
'this speeds' ?
> + *
> + * @return
> + * - (0) if successful.
> + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + * that operation.
> + * - (-EIO) if device is removed.
> + * - (-ENODEV) if *port_id* invalid.
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Get speed lanes supported by the NIC.
> + *
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param speed_lanes_capa
> + * speed_lanes_capa int array with valid lanes per speed.
> + * @param num
> + * size of the speed_lanes_capa array.
> + *
> + * @return
> + * - (0) if successful.
> + * - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + * that operation.
> + * - (-EIO) if device is removed.
> + * - (-ENODEV) if *port_id* invalid.
> + * - (-EINVAL) if *speed_lanes* invalid
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_get_capability(uint16_t port_id,
> + struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> + unsigned int num);
> +
>
The bottom of the header file is for static inline functions.
Instead of adding these new APIs at the very bottom of the header, can
you please group them just below the link speed related APIs?
> #ifdef __cplusplus
> }
> #endif
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 79f6f5293b..db9261946f 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -325,6 +325,11 @@ EXPERIMENTAL {
> rte_flow_template_table_resizable;
> rte_flow_template_table_resize;
> rte_flow_template_table_resize_complete;
> +
> + # added in 24.07
> + rte_eth_speed_lanes_get;
> + rte_eth_speed_lanes_get_capability;
> + rte_eth_speed_lanes_set;
> };
>
> INTERNAL {
^ permalink raw reply [relevance 4%]
* [PATCH v8 0/2] power: introduce PM QoS interface
` (5 preceding siblings ...)
2024-07-09 6:31 4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 7:25 4% ` Huisong Li
2024-07-09 7:25 5% ` [PATCH v8 1/2] power: introduce PM QoS API on CPU wide Huisong Li
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09 7:25 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v8:
- update the latest code to resolve CI warning
v7:
- remove a dead code rte_lcore_is_enabled in patch[2/2]
v6:
- update release_24_07.rst based on dpdk repo to resolve CI warning.
v5:
- use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup <mb@smartsharesystems.com>
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 24 ++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
7 files changed, 243 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* [PATCH v8 1/2] power: introduce PM QoS API on CPU wide
2024-07-09 7:25 4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 7:25 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09 7:25 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
6 files changed, 219 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 50ffc1f74a..e771868d9f 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -156,6 +156,10 @@ New Features
* Added defer queue reclamation via RCU.
* Added SVE support for bulk lookup.
+* **Introduce per-CPU PM QoS interface.**
+
+ * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[LINE_MAX];
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ snprintf(buf, sizeof(buf), "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ snprintf(buf, sizeof(buf), "%u", 0);
+ else
+ snprintf(buf, sizeof(buf), "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[LINE_MAX];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v7 1/2] power: introduce PM QoS API on CPU wide
2024-07-09 6:31 4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 6:31 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09 6:31 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
6 files changed, 219 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 1dd842df3a..af6fd82a3c 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -155,6 +155,10 @@ New Features
Added an API that allows the user to reclaim the defer queue with RCU.
+* **Introduce per-CPU PM QoS interface.**
+
+ * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[LINE_MAX];
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ snprintf(buf, sizeof(buf), "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ snprintf(buf, sizeof(buf), "%u", 0);
+ else
+ snprintf(buf, sizeof(buf), "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[LINE_MAX];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v7 0/2] power: introduce PM QoS interface
` (4 preceding siblings ...)
2024-07-09 2:29 4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 6:31 4% ` Huisong Li
2024-07-09 6:31 5% ` [PATCH v7 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 7:25 4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09 6:31 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v7:
- remove a dead code rte_lcore_is_enabled in patch[2/2]
v6:
- update release_24_07.rst based on dpdk repo to resolve CI warning.
v5:
- use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup <mb@smartsharesystems.com>
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 24 ++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
7 files changed, 243 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* Re: [PATCH v12 0/7] hash: add SVE support for bulk key lookup
2024-07-08 12:14 3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
2024-07-08 12:14 3% ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
@ 2024-07-09 4:48 0% ` David Marchand
1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-07-09 4:48 UTC (permalink / raw)
To: Yoan Picchi; +Cc: dev, nd
On Mon, Jul 8, 2024 at 2:14 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> This patchset adds SVE support for the signature comparison in the cuckoo
> hash lookup and improves the existing NEON implementation. These
> optimizations required changes to the data format and signature of the
> relevant functions to support dense hitmasks (no padding) and having the
> primary and secondary hitmasks interleaved instead of being in their own
> array each.
>
> Benchmarking the cuckoo hash perf test, I observed this effect on speed:
> There are no significant changes on Intel (ran on Sapphire Rapids)
> Neon is up to 7-10% faster (ran on ampere altra)
> 128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
> 3 cloud instance)
> 256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
> 3 cloud instance)
>
> V2->V3:
> Remove a redundant if in the test
> Change a couple int to uint16_t in compare_signatures_dense
> Several codding-style fix
>
> V3->V4:
> Rebase
>
> V4->V5:
> Commit message
>
> V5->V6:
> Move the arch-specific code into new arch-specific files
> Isolate the data struture refactor from adding SVE
>
> V6->V7:
> Commit message
> Moved RTE_HASH_COMPARE_SVE to the last commit of the chain
>
> V7->V8:
> Commit message
> Typos and missing spaces
>
> V8->V9:
> Use __rte_unused instead of (void)
> Fix an indentation mistake
>
> V9->V10:
> Fix more formating and indentation
> Move the new compare signature file directly in hash instead of being
> in a new subdir
> Re-order includes
> Remove duplicated static check
> Move rte_hash_sig_compare_function's definition into a private header
>
> V10->V11:
> Split the "pack the hitmask" commit into four commits:
> Move the compare function enum out of the ABI
> Move the compare function implementations into arch-specific files
> Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
> in the future
> Implement the dense hitmask
> Add missing header guards
> Move compare function enum into cuckoo_hash.c instead of its own header.
>
> V11->V12:
> Change the name of the compare function file (remove the _pvt suffix)
>
> Yoan Picchi (7):
> hash: make compare signature function enum private
> hash: split compare signature into arch-specific files
> hash: add a check on hash entry max size
> hash: pack the hitmask for hash in bulk lookup
> hash: optimize compare signature for NEON
> test/hash: check bulk lookup of keys after collision
> hash: add SVE support for bulk key lookup
>
> .mailmap | 2 +
> app/test/test_hash.c | 99 +++++++++---
> lib/hash/compare_signatures_arm.h | 121 +++++++++++++++
> lib/hash/compare_signatures_generic.h | 40 +++++
> lib/hash/compare_signatures_x86.h | 55 +++++++
> lib/hash/rte_cuckoo_hash.c | 207 ++++++++++++++------------
> lib/hash/rte_cuckoo_hash.h | 10 +-
> 7 files changed, 410 insertions(+), 124 deletions(-)
> create mode 100644 lib/hash/compare_signatures_arm.h
> create mode 100644 lib/hash/compare_signatures_generic.h
> create mode 100644 lib/hash/compare_signatures_x86.h
I added RN updates, reformated commitlogs, fixed header guards and
removed some pvt leftover.
Series applied, thanks.
--
David Marchand
^ permalink raw reply [relevance 0%]
* [PATCH v6 1/2] power: introduce PM QoS API on CPU wide
2024-07-09 2:29 4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 2:29 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09 2:29 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
6 files changed, 219 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 1dd842df3a..af6fd82a3c 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -155,6 +155,10 @@ New Features
Added an API that allows the user to reclaim the defer queue with RCU.
+* **Introduce per-CPU PM QoS interface.**
+
+ * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[LINE_MAX];
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ snprintf(buf, sizeof(buf), "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ snprintf(buf, sizeof(buf), "%u", 0);
+ else
+ snprintf(buf, sizeof(buf), "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[LINE_MAX];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v6 0/2] power: introduce PM QoS interface
` (3 preceding siblings ...)
2024-07-02 3:50 4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09 2:29 4% ` Huisong Li
2024-07-09 2:29 5% ` [PATCH v6 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 6:31 4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09 7:25 4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09 2:29 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v6:
- update release_24_07.rst based on dpdk repo to resolve CI warning.
v5:
- use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup <mb@smartsharesystems.com>
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 28 ++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
7 files changed, 247 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* [PATCH v12 1/7] hash: make compare signature function enum private
2024-07-08 12:14 3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-08 12:14 3% ` Yoan Picchi
2024-07-09 4:48 0% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
1 sibling, 0 replies; 200+ results
From: Yoan Picchi @ 2024-07-08 12:14 UTC (permalink / raw)
To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin
Cc: dev, nd, Yoan Picchi
enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
---
lib/hash/rte_cuckoo_hash.c | 10 ++++++++++
lib/hash/rte_cuckoo_hash.h | 10 +---------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
#include "rte_cuckoo_hash.h"
+/* Enum used to select the implementation of the signature comparison function to use
+ * eg: A system supporting SVE might want to use a NEON or scalar implementation.
+ */
+enum rte_hash_sig_compare_function {
+ RTE_HASH_COMPARE_SCALAR = 0,
+ RTE_HASH_COMPARE_SSE,
+ RTE_HASH_COMPARE_NEON,
+ RTE_HASH_COMPARE_NUM
+};
+
/* Mask of all flags supported by this version */
#define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
char key[0];
};
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
- RTE_HASH_COMPARE_SCALAR = 0,
- RTE_HASH_COMPARE_SSE,
- RTE_HASH_COMPARE_NEON,
- RTE_HASH_COMPARE_NUM
-};
-
/** Bucket structure */
struct __rte_cache_aligned rte_hash_bucket {
uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
/**< Custom function used to compare keys. */
enum cmp_jump_table_case cmp_jump_table_idx;
/**< Indicates which compare function to use. */
- enum rte_hash_sig_compare_function sig_cmp_fn;
+ unsigned int sig_cmp_fn;
/**< Indicates which signature compare function to use. */
uint32_t bucket_bitmask;
/**< Bitmask for getting bucket index from hash signature. */
--
2.25.1
^ permalink raw reply [relevance 3%]
* [PATCH v12 0/7] hash: add SVE support for bulk key lookup
` (2 preceding siblings ...)
2024-07-05 17:45 3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-08 12:14 3% ` Yoan Picchi
2024-07-08 12:14 3% ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-09 4:48 0% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
3 siblings, 2 replies; 200+ results
From: Yoan Picchi @ 2024-07-08 12:14 UTC (permalink / raw)
Cc: dev, nd, Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.
Benchmarking the cuckoo hash perf test, I observed this effect on speed:
There are no significant changes on Intel (ran on Sapphire Rapids)
Neon is up to 7-10% faster (ran on ampere altra)
128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)
V2->V3:
Remove a redundant if in the test
Change a couple int to uint16_t in compare_signatures_dense
Several codding-style fix
V3->V4:
Rebase
V4->V5:
Commit message
V5->V6:
Move the arch-specific code into new arch-specific files
Isolate the data struture refactor from adding SVE
V6->V7:
Commit message
Moved RTE_HASH_COMPARE_SVE to the last commit of the chain
V7->V8:
Commit message
Typos and missing spaces
V8->V9:
Use __rte_unused instead of (void)
Fix an indentation mistake
V9->V10:
Fix more formating and indentation
Move the new compare signature file directly in hash instead of being
in a new subdir
Re-order includes
Remove duplicated static check
Move rte_hash_sig_compare_function's definition into a private header
V10->V11:
Split the "pack the hitmask" commit into four commits:
Move the compare function enum out of the ABI
Move the compare function implementations into arch-specific files
Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
in the future
Implement the dense hitmask
Add missing header guards
Move compare function enum into cuckoo_hash.c instead of its own header.
V11->V12:
Change the name of the compare function file (remove the _pvt suffix)
Yoan Picchi (7):
hash: make compare signature function enum private
hash: split compare signature into arch-specific files
hash: add a check on hash entry max size
hash: pack the hitmask for hash in bulk lookup
hash: optimize compare signature for NEON
test/hash: check bulk lookup of keys after collision
hash: add SVE support for bulk key lookup
.mailmap | 2 +
app/test/test_hash.c | 99 +++++++++---
lib/hash/compare_signatures_arm.h | 121 +++++++++++++++
lib/hash/compare_signatures_generic.h | 40 +++++
lib/hash/compare_signatures_x86.h | 55 +++++++
lib/hash/rte_cuckoo_hash.c | 207 ++++++++++++++------------
lib/hash/rte_cuckoo_hash.h | 10 +-
7 files changed, 410 insertions(+), 124 deletions(-)
create mode 100644 lib/hash/compare_signatures_arm.h
create mode 100644 lib/hash/compare_signatures_generic.h
create mode 100644 lib/hash/compare_signatures_x86.h
--
2.25.1
^ permalink raw reply [relevance 3%]
* [PATCH] net/mlx5: fix compilation warning in GCC-9.1
@ 2024-07-07 9:57 4% Gregory Etelson
2024-07-18 7:24 4% ` Raslan Darawsheh
0 siblings, 1 reply; 200+ results
From: Gregory Etelson @ 2024-07-07 9:57 UTC (permalink / raw)
To: dev
Cc: getelson, mkashani, rasland, stable, Dariusz Sosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
GCC has introduced a bugfix in 9.1 that changed GCC ABI in ARM setups:
https://gcc.gnu.org/gcc-9/changes.html
```
On Arm targets (arm*-*-*), a bug in the implementation of the
procedure call standard (AAPCS) in the GCC 6, 7 and 8 releases
has been fixed: a structure containing a bit-field based on a 64-bit
integral type and where no other element in a structure required
64-bit alignment could be passed incorrectly to functions.
This is an ABI change. If the option -Wpsabi is enabled
(on by default) the compiler will emit a diagnostic note for code
that might be affected.
```
The patch fixes PMD compilation in the INTEGRITY flow item.
Fixes: 23b0a8b298b1 ("net/mlx5: fix integrity item validation and translation")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/mlx5_flow_dv.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 8a0d58cb05..89057edbcf 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7396,11 +7396,13 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
}
static int
-validate_integrity_bits(const struct rte_flow_item_integrity *mask,
+validate_integrity_bits(const void *arg,
int64_t pattern_flags, uint64_t l3_flags,
uint64_t l4_flags, uint64_t ip4_flag,
struct rte_flow_error *error)
{
+ const struct rte_flow_item_integrity *mask = arg;
+
if (mask->l3_ok && !(pattern_flags & l3_flags))
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM,
--
2.43.0
^ permalink raw reply [relevance 4%]
* RE: [PATCH 1/1] net/ena: restructure the llq policy user setting
2024-07-05 17:32 4% ` Ferruh Yigit
@ 2024-07-06 4:59 4% ` Brandes, Shai
0 siblings, 0 replies; 200+ results
From: Brandes, Shai @ 2024-07-06 4:59 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]
Sure, thanks!
בתאריך 5 ביולי 2024 20:32, Ferruh Yigit <ferruh.yigit@amd.com> כתב:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 6/6/2024 2:33 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
>
> Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
> devargs with a new shared devarg named `llq_policy` that
> implements the same logic and accepts the following values:
> 0 - Disable LLQ.
> Use with extreme caution as it leads to a huge performance
> degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
> Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
> Required for packets with header that exceed 96 bytes on
> AWS instances prior to 5th generation.
>
> Signed-off-by: Shai Brandes <shaibran@amazon.com>
> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
>
Hi Shai,
This patch changes device parameters and impacts end user.
Although this is not part of ABI policy, and we don't have an explicit
policy around it, but since it may impact end user experience, would you
be OK to postpone this patch to v24.11 release, where ABI break is planned?
[-- Attachment #2: Type: text/html, Size: 2262 bytes --]
^ permalink raw reply [relevance 4%]
* [PATCH v11 0/7] hash: add SVE support for bulk key lookup
@ 2024-07-05 17:45 3% ` Yoan Picchi
2024-07-05 17:45 3% ` [PATCH v11 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-08 12:14 3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
3 siblings, 1 reply; 200+ results
From: Yoan Picchi @ 2024-07-05 17:45 UTC (permalink / raw)
Cc: dev, nd, Yoan Picchi
This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.
Benchmarking the cuckoo hash perf test, I observed this effect on speed:
There are no significant changes on Intel (ran on Sapphire Rapids)
Neon is up to 7-10% faster (ran on ampere altra)
128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
3 cloud instance)
256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
3 cloud instance)
V2->V3:
Remove a redundant if in the test
Change a couple int to uint16_t in compare_signatures_dense
Several codding-style fix
V3->V4:
Rebase
V4->V5:
Commit message
V5->V6:
Move the arch-specific code into new arch-specific files
Isolate the data struture refactor from adding SVE
V6->V7:
Commit message
Moved RTE_HASH_COMPARE_SVE to the last commit of the chain
V7->V8:
Commit message
Typos and missing spaces
V8->V9:
Use __rte_unused instead of (void)
Fix an indentation mistake
V9->V10:
Fix more formating and indentation
Move the new compare signature file directly in hash instead of being
in a new subdir
Re-order includes
Remove duplicated static check
Move rte_hash_sig_compare_function's definition into a private header
V10->V11:
Split the "pack the hitmask" commit into four commits:
Move the compare function enum out of the ABI
Move the compare function implementations into arch-specific files
Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
in the future
Implement the dense hitmask
Add missing header guards
Move compare function enum into cuckoo_hash.c instead of its own header.
Yoan Picchi (7):
hash: make compare signature function enum private
hash: split compare signature into arch-specific files
hash: add a check on hash entry max size
hash: pack the hitmask for hash in bulk lookup
hash: optimize compare signature for NEON
test/hash: check bulk lookup of keys after collision
hash: add SVE support for bulk key lookup
.mailmap | 2 +
app/test/test_hash.c | 99 ++++++++---
lib/hash/compare_signatures_arm_pvt.h | 121 +++++++++++++
lib/hash/compare_signatures_generic_pvt.h | 40 +++++
lib/hash/compare_signatures_x86_pvt.h | 55 ++++++
lib/hash/rte_cuckoo_hash.c | 207 ++++++++++++----------
lib/hash/rte_cuckoo_hash.h | 10 +-
7 files changed, 410 insertions(+), 124 deletions(-)
create mode 100644 lib/hash/compare_signatures_arm_pvt.h
create mode 100644 lib/hash/compare_signatures_generic_pvt.h
create mode 100644 lib/hash/compare_signatures_x86_pvt.h
--
2.34.1
^ permalink raw reply [relevance 3%]
* [PATCH v11 1/7] hash: make compare signature function enum private
2024-07-05 17:45 3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-05 17:45 3% ` Yoan Picchi
0 siblings, 0 replies; 200+ results
From: Yoan Picchi @ 2024-07-05 17:45 UTC (permalink / raw)
To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin
Cc: dev, nd, Yoan Picchi
enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
---
lib/hash/rte_cuckoo_hash.c | 10 ++++++++++
lib/hash/rte_cuckoo_hash.h | 10 +---------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
#include "rte_cuckoo_hash.h"
+/* Enum used to select the implementation of the signature comparison function to use
+ * eg: A system supporting SVE might want to use a NEON or scalar implementation.
+ */
+enum rte_hash_sig_compare_function {
+ RTE_HASH_COMPARE_SCALAR = 0,
+ RTE_HASH_COMPARE_SSE,
+ RTE_HASH_COMPARE_NEON,
+ RTE_HASH_COMPARE_NUM
+};
+
/* Mask of all flags supported by this version */
#define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
char key[0];
};
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
- RTE_HASH_COMPARE_SCALAR = 0,
- RTE_HASH_COMPARE_SSE,
- RTE_HASH_COMPARE_NEON,
- RTE_HASH_COMPARE_NUM
-};
-
/** Bucket structure */
struct __rte_cache_aligned rte_hash_bucket {
uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
/**< Custom function used to compare keys. */
enum cmp_jump_table_case cmp_jump_table_idx;
/**< Indicates which compare function to use. */
- enum rte_hash_sig_compare_function sig_cmp_fn;
+ unsigned int sig_cmp_fn;
/**< Indicates which signature compare function to use. */
uint32_t bucket_bitmask;
/**< Bitmask for getting bucket index from hash signature. */
--
2.34.1
^ permalink raw reply [relevance 3%]
* Re: [PATCH 1/1] net/ena: restructure the llq policy user setting
2024-06-06 13:33 3% ` [PATCH 1/1] net/ena: restructure the llq policy user setting shaibran
@ 2024-07-05 17:32 4% ` Ferruh Yigit
2024-07-06 4:59 4% ` Brandes, Shai
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-07-05 17:32 UTC (permalink / raw)
To: shaibran; +Cc: dev
On 6/6/2024 2:33 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
>
> Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
> devargs with a new shared devarg named `llq_policy` that
> implements the same logic and accepts the following values:
> 0 - Disable LLQ.
> Use with extreme caution as it leads to a huge performance
> degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
> Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
> Required for packets with header that exceed 96 bytes on
> AWS instances prior to 5th generation.
>
> Signed-off-by: Shai Brandes <shaibran@amazon.com>
> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
>
Hi Shai,
This patch changes device parameters and impacts end user.
Although this is not part of ABI policy, and we don't have an explicit
policy around it, but since it may impact end user experience, would you
be OK to postpone this patch to v24.11 release, where ABI break is planned?
^ permalink raw reply [relevance 4%]
* [PATCH v6] graph: expose node context as pointers
@ 2024-07-05 14:52 4% Robin Jarry
2024-07-12 11:39 0% ` [EXTERNAL] " Kiran Kumar Kokkilagadda
0 siblings, 1 reply; 200+ results
From: Robin Jarry @ 2024-07-05 14:52 UTC (permalink / raw)
To: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan
In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. And without
intermediate structures, casting to opaque pointers is hard without
violating strict aliasing rules.
Add an unnamed union to allow storing opaque pointers in the node
context. Unfortunately, aligning an unnamed union that contains an array
produces inconsistent results between C and C++. To preserve ABI/API
compatibility in both C and C++, move all fast-path area fields into an
unnamed struct which is itself cache aligned. Use __rte_cache_aligned to
preserve existing alignment on architectures where cache lines are 128
bytes.
Add a static assert to ensure that the fast path area does not grow
beyond a 64 bytes cache line.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
Notes:
v6:
* Fix ABI breakage on arm64 (and all platforms that have RTE_CACHE_LINE_SIZE=128).
* This patch will cause CI failures without libabigail 2.5. See this commit
https://sourceware.org/git/?p=libabigail.git;a=commitdiff;h=f821c2be3fff2047ef8fc436f6f02301812d166f
for more details.
v5:
* Helper functions to hide casting proved to be harder than expected.
Naive casting may even be impossible without breaking strict aliasing
rules. The only other option would be to use explicit memcpy calls.
* Unnamed union tentative again. As suggested by Tyler (thank you!),
using an intermediate unnamed struct to carry the alignment produces
consistent ABI in C and C++.
* Also, Tyler (thank you!) suggested that the fast path area alignment
size may be incorrect for architectures where the cache line is not 64
bytes. There will be a 64 bytes hole in the structure at the end of
the unnamed struct before the zero length next nodes array. Use
__rte_cache_min_aligned to preserve existing alignment.
v4:
* Replaced the unnamed union with helper inline functions.
v3:
* Added __extension__ to the unnamed struct inside the union.
* Fixed C++ header checks.
* Replaced alignas() with an explicit static_assert.
lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..8d8956fdddda 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -12,7 +12,9 @@
* process, enqueue and move streams of objects to the next nodes.
*/
+#include <assert.h>
#include <stdalign.h>
+#include <stddef.h>
#include <rte_common.h>
#include <rte_cycles.h>
@@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
} dispatch;
};
/* Fast path area */
+ __extension__ struct __rte_cache_aligned {
#define RTE_NODE_CTX_SZ 16
- alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
- uint16_t size; /**< Total number of objects available. */
- uint16_t idx; /**< Number of objects used. */
- rte_graph_off_t off; /**< Offset of node in the graph reel. */
- uint64_t total_cycles; /**< Cycles spent in this node. */
- uint64_t total_calls; /**< Calls done to this node. */
- uint64_t total_objs; /**< Objects processed by this node. */
+ union {
+ uint8_t ctx[RTE_NODE_CTX_SZ];
+ __extension__ struct {
+ void *ctx_ptr;
+ void *ctx_ptr2;
+ };
+ }; /**< Node Context. */
+ uint16_t size; /**< Total number of objects available. */
+ uint16_t idx; /**< Number of objects used. */
+ rte_graph_off_t off; /**< Offset of node in the graph reel. */
+ uint64_t total_cycles; /**< Cycles spent in this node. */
+ uint64_t total_calls; /**< Calls done to this node. */
+ uint64_t total_objs; /**< Objects processed by this node. */
union {
void **objs; /**< Array of object pointers. */
uint64_t objs_u64;
@@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
rte_node_process_t process; /**< Process function. */
uint64_t process_u64;
};
- alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
+ alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
+ };
};
+static_assert(offsetof(struct rte_node, nodes) - offsetof(struct rte_node, ctx)
+ == RTE_CACHE_LINE_MIN_SIZE, "rte_node fast path area must fit in 64 bytes");
+
/**
* @internal
*
--
2.45.2
^ permalink raw reply [relevance 4%]
* Re: [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup
@ 2024-07-04 20:31 3% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-07-04 20:31 UTC (permalink / raw)
To: Yoan Picchi
Cc: Thomas Monjalon, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
Vladimir Medvedkin, dev, nd, Ruifeng Wang, Nathan Brown
Hello Yoan,
On Wed, Jul 3, 2024 at 7:13 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> Current hitmask includes padding due to Intel's SIMD
> implementation detail. This patch allows non Intel SIMD
> implementations to benefit from a dense hitmask.
> In addition, the new dense hitmask interweave the primary
> and secondary matches which allow a better cache usage and
> enable future improvements for the SIMD implementations
> The default non SIMD path now use this dense mask.
>
> Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Nathan Brown <nathan.brown@arm.com>
This patch does too many things at the same time.
There is code movement and behavior modifications all mixed in.
As there was still no review from the lib maintainer... I am going a
bit more in depth this time.
Please split this patch to make it less hard to understand.
I can see the need for at least one patch for isolating the change on
sig_cmp_fn from the exposed API, then one patch for moving the code to
per arch headers with *no behavior change*, and one patch for
introducing/switching to "dense hitmask".
More comments below.
> ---
> .mailmap | 1 +
> lib/hash/compare_signatures_arm_pvt.h | 60 +++++++
> lib/hash/compare_signatures_generic_pvt.h | 37 +++++
> lib/hash/compare_signatures_x86_pvt.h | 49 ++++++
> lib/hash/hash_sig_cmp_func_pvt.h | 20 +++
> lib/hash/rte_cuckoo_hash.c | 190 +++++++++++-----------
> lib/hash/rte_cuckoo_hash.h | 10 +-
> 7 files changed, 267 insertions(+), 100 deletions(-)
> create mode 100644 lib/hash/compare_signatures_arm_pvt.h
> create mode 100644 lib/hash/compare_signatures_generic_pvt.h
> create mode 100644 lib/hash/compare_signatures_x86_pvt.h
> create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h
>
> diff --git a/.mailmap b/.mailmap
> index f76037213d..ec525981fe 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1661,6 +1661,7 @@ Yixue Wang <yixue.wang@intel.com>
> Yi Yang <yangyi01@inspur.com> <yi.y.yang@intel.com>
> Yi Zhang <zhang.yi75@zte.com.cn>
> Yoann Desmouceaux <ydesmouc@cisco.com>
> +Yoan Picchi <yoan.picchi@arm.com>
> Yogesh Jangra <yogesh.jangra@intel.com>
> Yogev Chaimovich <yogev@cgstowernetworks.com>
> Yongjie Gu <yongjiex.gu@intel.com>
> diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h
> new file mode 100644
> index 0000000000..e83bae9912
> --- /dev/null
> +++ b/lib/hash/compare_signatures_arm_pvt.h
I guess pvt stands for private.
No need for such suffix, this header won't be exported in any case.
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2018-2024 Arm Limited
> + */
> +
> +/*
> + * Arm's version uses a densely packed hitmask buffer:
> + * Every bit is in use.
> + */
Please put a header guard.
#ifndef <UPPERCASE_HEADER_NAME>_H
#define <UPPERCASE_HEADER_NAME>_H
> +
> +#include <inttypes.h>
> +#include <rte_common.h>
> +#include <rte_vect.h>
> +
> +#include "rte_cuckoo_hash.h"
> +#include "hash_sig_cmp_func_pvt.h"
> +
> +#define DENSE_HASH_BULK_LOOKUP 1
> +
> +static inline void
> +compare_signatures_dense(uint16_t *hitmask_buffer,
> + const uint16_t *prim_bucket_sigs,
> + const uint16_t *sec_bucket_sigs,
> + uint16_t sig,
> + enum rte_hash_sig_compare_function sig_cmp_fn)
> +{
> +
> + static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 8),
> + "hitmask_buffer must be wide enough to fit a dense hitmask");
> +
> + /* For match mask every bits indicates the match */
> + switch (sig_cmp_fn) {
> +#if RTE_HASH_BUCKET_ENTRIES <= 8
> + case RTE_HASH_COMPARE_NEON: {
> + uint16x8_t vmat, vsig, x;
> + int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
> + uint16_t low, high;
> +
> + vsig = vld1q_dup_u16((uint16_t const *)&sig);
> + /* Compare all signatures in the primary bucket */
> + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)prim_bucket_sigs));
> + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> + low = (uint16_t)(vaddvq_u16(x));
> + /* Compare all signatures in the secondary bucket */
> + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)sec_bucket_sigs));
> + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> + high = (uint16_t)(vaddvq_u16(x));
> + *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
> +
> + }
> + break;
> +#endif
> + default:
> + for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> + *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
> + *hitmask_buffer |=
> + ((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES;
> + }
> + }
> +}
IIRC, this code is copied in all three headers.
It is a common scalar version, so the ARM code could simply call the
"generic" implementation rather than copy/paste.
[snip]
> diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_signatures_x86_pvt.h
> new file mode 100644
> index 0000000000..932912ba19
> --- /dev/null
> +++ b/lib/hash/compare_signatures_x86_pvt.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2018-2024 Arm Limited
> + */
> +
> +/*
> + * x86's version uses a sparsely packed hitmask buffer:
> + * Every other bit is padding.
> + */
> +
> +#include <inttypes.h>
> +#include <rte_common.h>
> +#include <rte_vect.h>
> +
> +#include "rte_cuckoo_hash.h"
> +#include "hash_sig_cmp_func_pvt.h"
> +
> +#define DENSE_HASH_BULK_LOOKUP 0
> +
> +static inline void
> +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> + const struct rte_hash_bucket *prim_bkt,
> + const struct rte_hash_bucket *sec_bkt,
> + uint16_t sig,
> + enum rte_hash_sig_compare_function sig_cmp_fn)
> +{
> + /* For match mask the first bit of every two bits indicates the match */
> + switch (sig_cmp_fn) {
> +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8
The check on RTE_HASH_BUCKET_ENTRIES <= 8 seems new.
It was not present in the previous implementation for SSE2, and this
difference is not explained.
> + case RTE_HASH_COMPARE_SSE:
> + /* Compare all signatures in the bucket */
> + *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
> + (__m128i const *)prim_bkt->sig_current), _mm_set1_epi16(sig)));
> + /* Extract the even-index bits only */
> + *prim_hash_matches &= 0x5555;
> + /* Compare all signatures in the bucket */
> + *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
> + (__m128i const *)sec_bkt->sig_current), _mm_set1_epi16(sig)));
> + /* Extract the even-index bits only */
> + *sec_hash_matches &= 0x5555;
> + break;
> +#endif /* defined(__SSE2__) */
> + default:
> + for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> + *prim_hash_matches |= (sig == prim_bkt->sig_current[i]) << (i << 1);
> + *sec_hash_matches |= (sig == sec_bkt->sig_current[i]) << (i << 1);
> + }
> + }
> +}
> diff --git a/lib/hash/hash_sig_cmp_func_pvt.h b/lib/hash/hash_sig_cmp_func_pvt.h
> new file mode 100644
> index 0000000000..d8d2fbffaf
> --- /dev/null
> +++ b/lib/hash/hash_sig_cmp_func_pvt.h
Please rename as compare_signatures.h or maybe a simpler option is to
move this enum declaration in rte_cuckoo_hash.c before including the
per arch headers.
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef _SIG_CMP_FUNC_H_
> +#define _SIG_CMP_FUNC_H_
If keeping a header, this guard must reflect the file name.
> +
> +/** Enum used to select the implementation of the signature comparison function to use
/* is enough, doxygen only parses public headers.
> + * eg: A system supporting SVE might want to use a NEON implementation.
> + * Those may change and are for internal use only
> + */
> +enum rte_hash_sig_compare_function {
> + RTE_HASH_COMPARE_SCALAR = 0,
> + RTE_HASH_COMPARE_SSE,
> + RTE_HASH_COMPARE_NEON,
> + RTE_HASH_COMPARE_SVE,
> + RTE_HASH_COMPARE_NUM
> +};
> +
> +#endif
[snip]
> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index a528f1d1a0..26a992419a 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -134,14 +134,6 @@ struct rte_hash_key {
> char key[0];
> };
>
> -/* All different signature compare functions */
> -enum rte_hash_sig_compare_function {
> - RTE_HASH_COMPARE_SCALAR = 0,
> - RTE_HASH_COMPARE_SSE,
> - RTE_HASH_COMPARE_NEON,
> - RTE_HASH_COMPARE_NUM
> -};
> -
> /** Bucket structure */
> struct __rte_cache_aligned rte_hash_bucket {
> uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> @@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
> /**< Custom function used to compare keys. */
> enum cmp_jump_table_case cmp_jump_table_idx;
> /**< Indicates which compare function to use. */
> - enum rte_hash_sig_compare_function sig_cmp_fn;
> + unsigned int sig_cmp_fn;
From an ABI perspective, it looks ok.
We may be breaking users that would inspect this public object, but I
think it is ok.
In any case, put this change in a separate patch so it is more visible.
> /**< Indicates which signature compare function to use. */
> uint32_t bucket_bitmask;
> /**< Bitmask for getting bucket index from hash signature. */
> --
> 2.25.1
>
--
David Marchand
^ permalink raw reply [relevance 3%]
* Re: [PATCH v5] bitmap: add scan from offset function
@ 2024-07-03 13:42 0% ` Volodymyr Fialko
0 siblings, 0 replies; 200+ results
From: Volodymyr Fialko @ 2024-07-03 13:42 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, cristian.dumitrescu, Jerin Jacob, Anoob Joseph
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, July 3, 2024 2:50 PM
> To: Volodymyr Fialko
> Cc: dev@dpdk.org; cristian.dumitrescu@intel.com; Jerin Jacob; Anoob Joseph
> Subject: Re: [PATCH v5] bitmap: add scan from offset function
>
> 03/07/2023 14:39, Volodymyr Fialko:
> > Currently, in the case when we search for a bit set after a particular
> > value, the bitmap has to be scanned from the beginning and
> > rte_bitmap_scan() has to be called multiple times until we hit the value.
> >
> > Add a new rte_bitmap_scan_from_offset() function to initialize scan
> > state at the given offset and perform scan, this will allow getting
> > the next set bit after certain offset within one scan call.
> >
> > Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
> > ---
> > v2:
> > - added rte_bitmap_scan_from_offset
> > v3:
> > - added note for internal use only for init_at function
> > v4:
> > - marked init_at function as __rte_internal
> > v5:
> > - removed __rte_internal due to build errors
>
> What was the build error?
>
> You should not add an internal function in the public header file.
> At least, it should be experimental.
>
From our discussion in previous versions(V3, V4), It looks like we agreed to
remove both markers.
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 3, 2023 2:17 PM
> To: Dumitrescu, Cristian; Volodymyr Fialko
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran; Anoob Joseph
> Subject: Re: [PATCH v3] bitmap: add scan from offset function
>
> > ----------------------------------------------------------------------
> > 03/07/2023 12:56, Volodymyr Fialko:
> > > Since it's header-only library, there is issue with using __rte_intenal (appeared in v4).
> >
> > What is the issue?
>
> From V4 ci build failure(http://mails.dpdk.org/archives/test-report/2023-July/421235.html):
> In file included from ../examples/ipsec-secgw/event_helper.c:6:
> ../lib/eal/include/rte_bitmap.h:645:2: error: Symbol is not public ABI
> __rte_bitmap_scan_init_at(bmp, offset);
> ^
> ../lib/eal/include/rte_bitmap.h:150:1: note: from 'diagnose_if' attribute on '__rte_bitmap_scan_init_at':
> __rte_internal
> ^~~~~~~~~~~~~~
> ../lib/eal/include/rte_compat.h:42:16: note: expanded from macro '__rte_internal'
> __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
> ^ ~
> 1 error generated.
> OK I see.
> So we should give up with __rte_internal for inline functions.
> As it is not supposed to be exposed to the applications,
> I think we can skip the __rte_experimental flag.
/Volodymyr
^ permalink raw reply [relevance 0%]
* [PATCH v5 1/2] power: introduce PM QoS API on CPU wide
2024-07-02 3:50 4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-02 3:50 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-02 3:50 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
6 files changed, 219 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..4de96f60ac 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
* Added SSE/NEON vector datapath.
+* **Introduce PM QoS interface.**
+
+ * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[LINE_MAX];
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ snprintf(buf, sizeof(buf), "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ snprintf(buf, sizeof(buf), "%u", 0);
+ else
+ snprintf(buf, sizeof(buf), "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[LINE_MAX];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v5 0/2] power: introduce PM QoS interface
` (2 preceding siblings ...)
2024-06-27 6:00 4% ` [PATCH v4 " Huisong Li
@ 2024-07-02 3:50 4% ` Huisong Li
2024-07-02 3:50 5% ` [PATCH v5 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 2:29 4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
` (2 subsequent siblings)
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-02 3:50 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v5:
- use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup <mb@smartsharesystems.com>
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 28 ++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
7 files changed, 247 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* Community CI Meeting Minutes - June 27, 2024
@ 2024-06-27 20:52 2% Patrick Robb
0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-06-27 20:52 UTC (permalink / raw)
To: ci; +Cc: dev, dts
#####################################################################
Attendees
1. Patrick Robb
2. Paul Szczepanek
3. Luca Vizzarro
4. Nicholas Pratte
5. Aaron Conole
6. Dean Marx
7. Jeremy Spewock
8. Juraj Linkeš
9. Manit Mahajan
10. Tomas Durovec
11. Adam Hassick
#####################################################################
Minutes
=====================================================================
General Announcements
* DPDK Summit in Montreal will be September 24-25:
https://www.dpdk.org/event/dpdk-summit-2024/
* CFP closes July 21
* Tech board voted yesterday to allow remote presentations at
Montreal (with lower priority)
* Luca will make a submission for a remote DTS talk
* David commented last week stating that there could be a section
for how to setup DTS, run the hello world testsuite
* Nathan Southern set up a call with some folks from AWS next Monday
to discuss testing on cloud infrastructure. Email Nathan if you want
to join this call.
* David indicated there will be a vote over email for a DTS branch for
framework patches
=====================================================================
CI Status
---------------------------------------------------------------------
UNH-IOL Community Lab
* David noted this week that the template engine is out of date
(UNH-IOL fork has some update from the past months). UNH now has a 60
day reminder for aggregation all commits made to our fork and
upstreaming.
* New Servers have arrived at UNH-IOL. Getting these mounted onto our
2nd DPDK Rack, setting up the associated infrastructure etc.
* Setting up the UPS, tor switch, etc for DPDK rack 2.
* Pending:
* Pending emails are going out, but the checks are not being
written to the API
* Emailed Ali - will have to debug with him
* We only are running this for ABI testing right now, but as soon
as the behavior with the PW API looks good, we can turn this on for
all the other labels (the PR for pending for all testing is ready)
* Depends-on support: Adam has submitted a patchseries to the PW
project which adds the changes to the Django models. Is under review.
* Github PR: https://github.com/getpatchwork/patchwork/pull/590
* Has put together the corresponding changes to git-pw (client side)
* Some overlap between the PW server and git-pw client - the pw
server maintainer is aware of this feature being added for git-pw
which will pair up with the dashboard updates
* SPDK: Submitted a patch fixing a malloc error which affected Fedora
40. This is now merged, so we added Fedora 40 coverage in our lab.
* We increased the retest limit per patchseries to 3 (was previously
1) due to a submitter who needed to retest multiple sets of contexts.
---------------------------------------------------------------------
Intel Lab
* None
---------------------------------------------------------------------
Github Actions
* None, the Robot is running smoothly.
* There was a GitHub outage itself a few weeks ago, but anyone who was
affected would have been able to request a retest.
---------------------------------------------------------------------
Loongarch Lab
* None
=====================================================================
DTS Improvements & Test Development
* Jumboframes testsuite: MTU behavior on different NIC drivers.
* Within each driver, there are variables set for taking off
ethernet overhead when setting max packet length.
* MLNX subtracts 18 bytes
* Intel/Broadcom subtract 26 bytes.
* But from testing it appears that you can only send packets
with MTU + 22 packets, not 26?
* These variables are not common across drivers… so basically
MTU as defined by different drivers is not the same
* When we build scapy packets, the ethernet overhead is 14 bytes
(source mac address, destination mac, error correction), so we can
actually increase the l3 packet above the given MTU and still send
packets
* Juraj: Important to make sure we are running from the latest
firmware/drivers on each device
* Firmware driver versions for devices are published per DPDK release
* Patrick Robbshould set up a 4 month reminder (during beginning
of dpdk release cycle) to update all firmware to whatever was
published as being supported for the release which just came out - use
this version for all testing for the upcoming release
* Mac Filter Testsuite: Submitted, getting reviews on the mailing
list. Nick will respond to Jeremy’s comments today and submit a new
version.
* VLAN Filter: Bugzilla ticket is submitted for the VLAN filtering
bug. David requested some verbose logs, so Dean redid the test and
attached those to the ticket.
* Queue Start/Stop and Dynamic Queue:
* Show port into exposes a capability for whether you can stop or
start the queue.
* There is another bugzilla ticket out for –max-packet-len. It does
not update the MTU if you are using a kernel driver.
* https://bugs.dpdk.org/show_bug.cgi?id=1470
* Need to double check this from the Jumboframes testsuite
* Paul: but essentially we're moving on from vff to l2fwd due to vff
requiring qemu, otherwise all normal, blocklist submitted
* UNH people need to test blocklist on our hardware
* Luca will be on vacation for the next two weeks
* Juraj will be on vacation next week.
* Capabilities patch: Working on an updated version, it might not be
ready before Juraj goes on vacation next week.
* Adds the conditional capabilities, for cases where some staging
is needed in order to report the capability. This is the case with
scatter (the capability will only report for some devices if MTU is
increased from the standard size)
* After the capability setup function runs and then the capability
checked, everything is cleaned up and the previous state restored.
* For the capability, use a decorated to associate a testpmd
function which will do the pre and post configuration for the
capability check
* Looking to add support for all the capabilities testpmd exposes
* XLM-RPC server replacement: new version submitted and awaiting reviews
* Extended usage of kwargs in the new version makes the code a
little more confusing. Right now the arguments are of type any, as
opposed to processing the output via a typed dict.
=====================================================================
Any other business
* Patrick Robbshould check with folks over email whether it would be
okay to reschedule the DTS calls from Wednesdays to Thursdays (same
time as CI calls, on the off weeks)
* Next Meeting: July 11, 2024
^ permalink raw reply [relevance 2%]
* RE: [PATCH] bpf: don't verify classic bpfs
2024-06-27 15:36 3% ` Thomas Monjalon
@ 2024-06-27 18:14 0% ` Konstantin Ananyev
0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-06-27 18:14 UTC (permalink / raw)
To: Thomas Monjalon, Stephen Hemminger, Yoav Winstein, Konstantin Ananyev; +Cc: dev
> > > > When classic BPFs with lots of branching instructions are compiled,
> > > > __rte_bpf_bpf_validate runs way too slow. A simple bpf such as:
> > > > 'ether host a0:38:6d:af:17:eb or b3:a3:ff:b6:c1:ef or ...' 12 times
> > > >
> > > > results in ~1 minute of bpf validation.
> > > > This patch makes __rte_bpf_bpf_validate be aware of bpf_prm originating
> > > > from classic BPF, allowing to safely skip over the validation.
> > > >
> > > > Signed-off-by: Yoav Winstein <yoav.w@claroty.com>
> > > > ---
> > >
> > > No.
> > > Wallpapering over a performance bug in the BPF library is not
> > > the best way to handle this. Please analyze the problem in the BPF
> > > library; it should be fixed there.
> >
> > +1
> > Blindly disabling verification for all cBPFs is the worst possible option here.
> > We need at least try to understand what exactly causing such slowdown.
>
> +1
>
> You didn't mention it is also breaking ABI compatibility.
Yep, it does, thanks Thomas for highlighting it.
Yoav, can I ask submitter to check would:
https://patchwork.dpdk.org/project/dpdk/list/?series=32321
fix the problem you are facing?
I think that the root cause is the same.
^ permalink raw reply [relevance 0%]
* Re: [PATCH] bpf: don't verify classic bpfs
@ 2024-06-27 15:36 3% ` Thomas Monjalon
2024-06-27 18:14 0% ` Konstantin Ananyev
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-06-27 15:36 UTC (permalink / raw)
To: Stephen Hemminger, Yoav Winstein, Konstantin Ananyev; +Cc: dev
16/05/2024 11:36, Konstantin Ananyev:
>
> > On Sun, 12 May 2024 08:55:45 +0300
> > Yoav Winstein <yoav.w@claroty.com> wrote:
> >
> > > When classic BPFs with lots of branching instructions are compiled,
> > > __rte_bpf_bpf_validate runs way too slow. A simple bpf such as:
> > > 'ether host a0:38:6d:af:17:eb or b3:a3:ff:b6:c1:ef or ...' 12 times
> > >
> > > results in ~1 minute of bpf validation.
> > > This patch makes __rte_bpf_bpf_validate be aware of bpf_prm originating
> > > from classic BPF, allowing to safely skip over the validation.
> > >
> > > Signed-off-by: Yoav Winstein <yoav.w@claroty.com>
> > > ---
> >
> > No.
> > Wallpapering over a performance bug in the BPF library is not
> > the best way to handle this. Please analyze the problem in the BPF
> > library; it should be fixed there.
>
> +1
> Blindly disabling verification for all cBPFs is the worst possible option here.
> We need at least try to understand what exactly causing such slowdown.
+1
You didn't mention it is also breaking ABI compatibility.
^ permalink raw reply [relevance 3%]
* [PATCH v4 1/2] power: introduce PM QoS API on CPU wide
2024-06-27 6:00 4% ` [PATCH v4 " Huisong Li
@ 2024-06-27 6:00 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-06-27 6:00 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
6 files changed, 219 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..4de96f60ac 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
* Added SSE/NEON vector datapath.
+* **Introduce PM QoS interface.**
+
+ * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..b131cf58e7
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[BUFSIZ] = {0};
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ sprintf(buf, "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ sprintf(buf, "%u", 0);
+ else
+ sprintf(buf, "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[BUFSIZ];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v4 0/2] power: introduce PM QoS interface
2024-06-13 11:20 4% ` [PATCH v2 0/2] power: " Huisong Li
2024-06-19 6:31 4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-06-27 6:00 4% ` Huisong Li
2024-06-27 6:00 5% ` [PATCH v4 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-02 3:50 4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
` (3 subsequent siblings)
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-06-27 6:00 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, stephen, david.marchand, liuyonglong,
lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v4:
- fix some comments basd on Stephen
- add stdint.h include
- add Acked-by Morten Brørup <mb@smartsharesystems.com>
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 24 ++++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 28 ++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 73 ++++++++++++++++
lib/power/version.map | 2 +
7 files changed, 247 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* Re: [PATCH v5] graph: expose node context as pointers
2024-06-25 15:22 0% ` Robin Jarry
@ 2024-06-26 11:30 0% ` Jerin Jacob
0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2024-06-26 11:30 UTC (permalink / raw)
To: Robin Jarry
Cc: David Marchand, dev, Jerin Jacob, Kiran Kumar K,
Nithin Dabilpuram, Zhirun Yan, Tyler Retzlaff
On Tue, Jun 25, 2024 at 9:02 PM Robin Jarry <rjarry@redhat.com> wrote:
>
> Sad :(
>
> > The introduced anonymous structure gets aligned on the minimum cache
> > line size (64 bytes): with this change, ctx[] move from offset 256, to
> > offset 192.
> > Similarly, nodes[] moves from offset 320 to offset 256.
> >
> > As we discussed offlist, there are a few options to workaround this
> > issue (like moving nodes[] inside the anonymous struct though it still
> > results in an increased rte_node struct, or like adding an explicit
> > padding field right before the newly introduced anonymous struct,
> > ...).
> [snip]
> > For those two reasons, it is better to revisit this patch and have it
> > ready for the next release.
> > While at it, it may be worth cleaning up the rte_node structure in
> > v24.11, if so, please announce in a deprecation notice for this
> > planned ABI breakage.
>
> Jerin, wouldn't it be better if we managed to fill in that 64 bytes
> hole?
It will be available only for 128B cache line system. So may not make sense.
I think, following change will resolve the issue in your patch.
From
__extension__ struct __rte_cache_min_aligned {
#define RTE_NODE_CTX_SZ 16
To
__extension__ struct __rte_cache__aligned {
#define RTE_NODE_CTX_SZ 16
>
> I don't know what to announce precisely about the breakage nature.
>
^ permalink raw reply [relevance 0%]
* Re: [PATCH v5] graph: expose node context as pointers
2024-06-18 12:33 4% ` David Marchand
@ 2024-06-25 15:22 0% ` Robin Jarry
2024-06-26 11:30 0% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Robin Jarry @ 2024-06-25 15:22 UTC (permalink / raw)
To: David Marchand
Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
Tyler Retzlaff
Sad :(
> The introduced anonymous structure gets aligned on the minimum cache
> line size (64 bytes): with this change, ctx[] move from offset 256, to
> offset 192.
> Similarly, nodes[] moves from offset 320 to offset 256.
>
> As we discussed offlist, there are a few options to workaround this
> issue (like moving nodes[] inside the anonymous struct though it still
> results in an increased rte_node struct, or like adding an explicit
> padding field right before the newly introduced anonymous struct,
> ...).
[snip]
> For those two reasons, it is better to revisit this patch and have it
> ready for the next release.
> While at it, it may be worth cleaning up the rte_node structure in
> v24.11, if so, please announce in a deprecation notice for this
> planned ABI breakage.
Jerin, wouldn't it be better if we managed to fill in that 64 bytes
hole?
I don't know what to announce precisely about the breakage nature.
^ permalink raw reply [relevance 0%]
* Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
2024-06-12 1:25 0% ` rongwei liu
@ 2024-06-25 14:46 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-06-25 14:46 UTC (permalink / raw)
To: Ferruh Yigit, rongwei liu
Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam, Suanming Mou,
Andrew Rybchenko, Dariusz Sosnowski, Aman Singh, Yuying Zhang,
jerin.jacob, bruce.richardson, david.marchand, ajit.khaparde
12/06/2024 03:25, rongwei liu:
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> > > @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
> > > uint8_t flags; /**< Should be 8 (I flag). */
> > > uint8_t rsvd0[3]; /**< Reserved. */
> > > uint8_t vni[3]; /**< VXLAN identifier. */
> > > - uint8_t rsvd1; /**< Reserved. */
> > > + union {
> > > + uint8_t rsvd1; /**< Reserved. */
> > > + uint8_t last_rsvd; /**< Reserved. */
> > > + };
> > >
> >
> > Is there a plan to remove 'rsvd1' in next ABI break release?
> > We can keep both, but I guess it is not logically necessary to keep it, to prevent
> > bloat by time, we can remove the old one.
> > If decided to remove, sending a 'deprecation.rst' update helps us to remember
> > doing it.
> >
> I think it should. @NBU-Contact-Thomas Monjalon (EXTERNAL) @Andrew Rybchenko@Ori Kam what do you think?
From user perspective, there is no benefit in removing an aliased field,
except for simplicity.
The drawback is a potential API compatibility breakage.
We may mark it as deprecated in the comment and plan for removal in a long time, let's say 25.11?
Is there anyone against removing "rsvd1" in VXLAN header for compatibility purpose?
^ permalink raw reply [relevance 0%]
* Re: [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev
2024-06-24 15:13 3% ` Stephen Hemminger
@ 2024-06-25 12:01 3% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-25 12:01 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Vladimir Ratnikov, longli, dev
On Mon, Jun 24, 2024 at 5:14 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 24 Jun 2024 11:04:15 +0000
> Vladimir Ratnikov <vratnikov@netgate.com> wrote:
>
> > diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
> > index e2475a642d..6b010cbe41 100644
> > --- a/drivers/bus/vmbus/bus_vmbus_driver.h
> > +++ b/drivers/bus/vmbus/bus_vmbus_driver.h
> > @@ -37,6 +37,7 @@ struct rte_vmbus_device {
> > rte_uuid_t device_id; /**< VMBUS device id */
> > rte_uuid_t class_id; /**< VMBUS device type */
> > uint32_t relid; /**< id for primary */
> > + uint16_t device_order; /**< Device order after probing */
> > uint8_t monitor_id; /**< monitor page */
> > int uio_num; /**< UIO device number */
> > uint32_t *int_page; /**< VMBUS interrupt page */
> > diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
>
> Is this an ABI change?
drivers/bus/vmbus/meson.build:driver_sdk_headers = files('bus_vmbus_driver.h')
Only drivers of this bus know the rte_vmbus_device object.
So this patch does not impact the public ABI.
Yet, I fail to see what this patch is trying to achieve.
--
David Marchand
^ permalink raw reply [relevance 3%]
* Re: [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev
@ 2024-06-24 15:13 3% ` Stephen Hemminger
2024-06-25 12:01 3% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-06-24 15:13 UTC (permalink / raw)
To: Vladimir Ratnikov; +Cc: longli, dev
On Mon, 24 Jun 2024 11:04:15 +0000
Vladimir Ratnikov <vratnikov@netgate.com> wrote:
> diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
> index e2475a642d..6b010cbe41 100644
> --- a/drivers/bus/vmbus/bus_vmbus_driver.h
> +++ b/drivers/bus/vmbus/bus_vmbus_driver.h
> @@ -37,6 +37,7 @@ struct rte_vmbus_device {
> rte_uuid_t device_id; /**< VMBUS device id */
> rte_uuid_t class_id; /**< VMBUS device type */
> uint32_t relid; /**< id for primary */
> + uint16_t device_order; /**< Device order after probing */
> uint8_t monitor_id; /**< monitor page */
> int uio_num; /**< UIO device number */
> uint32_t *int_page; /**< VMBUS interrupt page */
> diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
Is this an ABI change?
^ permalink raw reply [relevance 3%]
* [PATCH v7 4/4] dts: add API doc generation
@ 2024-06-24 14:25 2% ` Juraj Linkeš
0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 14:25 UTC (permalink / raw)
To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Cc: dev, Juraj Linkeš, Luca Vizzarro
The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.
Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.
There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
code.
* Also the same Python packages as DTS, for the same reason.
[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
doc/api/doxy-api-index.md | 3 +++
doc/api/doxy-api.conf.in | 2 ++
doc/api/meson.build | 11 +++++++---
doc/guides/conf.py | 39 ++++++++++++++++++++++++++++-----
doc/guides/meson.build | 1 +
doc/guides/tools/dts.rst | 34 +++++++++++++++++++++++++++-
dts/doc/meson.build | 27 +++++++++++++++++++++++
dts/meson.build | 16 ++++++++++++++
meson.build | 1 +
10 files changed, 147 insertions(+), 18 deletions(-)
create mode 100644 dts/doc/meson.build
create mode 100644 dts/meson.build
diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
# Copyright(c) 2019 Intel Corporation
#
+import argparse
import sys
import os
from os.path import join
from subprocess import run
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
# set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+ os.environ['DTS_ROOT'] = args.dts_root
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
# find all the files sphinx will process so we can write them as dependencies
srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
srcfiles.extend([join(root, f) for f in files])
+if not os.path.exists(args.dst):
+ os.makedirs(args.dst)
+
# run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
- process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
- stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+ process = run(
+ sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+ stdout=out
+ )
# create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
d.write('html: ' + ' '.join(srcfiles) + '\n')
sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
[experimental APIs](@ref rte_compat.h),
[ABI versioning](@ref rte_function_versioning.h),
[version](@ref rte_version.h)
+
+- **tests**:
+ [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE = YES
SORT_MEMBER_DOCS = NO
SOURCE_BROWSER = YES
+ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
EXAMPLE_PATH = @TOPDIR@/examples
EXAMPLE_PATTERNS = *.c
EXAMPLE_RECURSIVE = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
+doc_api_build_dir = meson.current_build_dir()
doxygen = find_program('doxygen', required: get_option('enable_docs'))
if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
# set up common Doxygen configuration
cdata = configuration_data()
cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
cdata.set('WARN_AS_ERROR', 'NO')
if get_option('werror')
cdata.set('WARN_AS_ERROR', 'YES')
endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
# configure HTML Doxygen run
html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
from sphinx import __version__ as sphinx_version
from os import listdir
from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
import configparser
@@ -24,6 +23,37 @@
file=stderr)
pass
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+ 'members': True,
+ 'member-order': 'bysource',
+ 'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+ path.append(dts_root)
+ # DTS Sidebar config
+ html_theme_options = {
+ 'collapse_navigation': False,
+ 'navigation_depth': -1,
+ }
+
stop_on_error = ('-W' in argv)
project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
html_show_copyright = False
highlight_language = 'none'
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
master_doc = 'index'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Intel Corporation
+doc_guides_source_dir = meson.current_source_dir()
sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
when some of the basics are not met.
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
The style must conform to the `Google style
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+ poetry install --no-root --with docs
+ poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+ ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+ Make sure to fix any Sphinx warnings when adding or updating docstrings,
+ and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
Configuration Schema
--------------------
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+ subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+ extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+ output: 'html',
+ command: [sphinx_wrapper, sphinx, meson.project_version(),
+ meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+ build_by_default: false,
+ install: get_option('enable_docs'),
+ install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+ message = 'No docs targets found'
+else
+ message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+ depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
# build docs
subdir('doc')
+subdir('dts')
# build any examples explicitly requested - useful for developers - and
# install any example code into the appropriate install path
--
2.34.1
^ permalink raw reply [relevance 2%]
* Re: [PATCH v6 4/4] dts: add API doc generation
2024-06-24 13:46 2% ` [PATCH v6 4/4] dts: add API doc generation Juraj Linkeš
@ 2024-06-24 14:08 0% ` Juraj Linkeš
0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 14:08 UTC (permalink / raw)
To: thomas
Cc: dev, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Hi Thomas,
I believe the only open question in this patch set is the linking of DTS
API docs on the main doxygen page. I've left only the parts relevant to
the question so that it's easier for us to address it.
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index f9283154f8..cc214ede46 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -244,3 +244,6 @@ The public API headers are grouped by topics:
> [experimental APIs](@ref rte_compat.h),
> [ABI versioning](@ref rte_function_versioning.h),
> [version](@ref rte_version.h)
> +
> +- **tests**:
> + [**DTS**](@dts_api_main_page)
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index a8823c046f..c94f02d411 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -124,6 +124,8 @@ SEARCHENGINE = YES
> SORT_MEMBER_DOCS = NO
> SOURCE_BROWSER = YES
>
> +ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
> +
> EXAMPLE_PATH = @TOPDIR@/examples
> EXAMPLE_PATTERNS = *.c
> EXAMPLE_RECURSIVE = YES
> diff --git a/doc/api/meson.build b/doc/api/meson.build
> index 5b50692df9..ffc75d7b5a 100644
> --- a/doc/api/meson.build
> +++ b/doc/api/meson.build
> @@ -32,14 +33,18 @@ example = custom_target('examples.dox',
> # set up common Doxygen configuration
> cdata = configuration_data()
> cdata.set('VERSION', meson.project_version())
> -cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
> -cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
> +cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
> +cdata.set('OUTPUT', doc_api_build_dir)
> cdata.set('TOPDIR', dpdk_source_root)
> -cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
> +cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
These three changes are here only for context, they're not relevant to
the linking question.
> cdata.set('WARN_AS_ERROR', 'NO')
> if get_option('werror')
> cdata.set('WARN_AS_ERROR', 'YES')
> endif
> +# A local reference must be relative to the main index.html page
> +# The path below can't be taken from the DTS meson file as that would
> +# require recursive subdir traversal (doc, dts, then doc again)
> +cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
This is where the path is actually set.
^ permalink raw reply [relevance 0%]
* [PATCH v6 4/4] dts: add API doc generation
@ 2024-06-24 13:46 2% ` Juraj Linkeš
2024-06-24 14:08 0% ` Juraj Linkeš
0 siblings, 1 reply; 200+ results
From: Juraj Linkeš @ 2024-06-24 13:46 UTC (permalink / raw)
To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Cc: dev, Juraj Linkeš, Luca Vizzarro
The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.
Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.
There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
code.
* Also the same Python packages as DTS, for the same reason.
[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
doc/api/doxy-api-index.md | 3 +++
doc/api/doxy-api.conf.in | 2 ++
doc/api/meson.build | 11 +++++++---
doc/guides/conf.py | 39 ++++++++++++++++++++++++++++-----
doc/guides/meson.build | 1 +
doc/guides/tools/dts.rst | 34 +++++++++++++++++++++++++++-
dts/doc/meson.build | 27 +++++++++++++++++++++++
dts/meson.build | 16 ++++++++++++++
meson.build | 1 +
10 files changed, 147 insertions(+), 18 deletions(-)
create mode 100644 dts/doc/meson.build
create mode 100644 dts/meson.build
diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
# Copyright(c) 2019 Intel Corporation
#
+import argparse
import sys
import os
from os.path import join
from subprocess import run
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
# set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+ os.environ['DTS_ROOT'] = args.dts_root
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
# find all the files sphinx will process so we can write them as dependencies
srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
srcfiles.extend([join(root, f) for f in files])
+if not os.path.exists(args.dst):
+ os.makedirs(args.dst)
+
# run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
- process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
- stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+ process = run(
+ sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+ stdout=out
+ )
# create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
d.write('html: ' + ' '.join(srcfiles) + '\n')
sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
[experimental APIs](@ref rte_compat.h),
[ABI versioning](@ref rte_function_versioning.h),
[version](@ref rte_version.h)
+
+- **tests**:
+ [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE = YES
SORT_MEMBER_DOCS = NO
SOURCE_BROWSER = YES
+ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
EXAMPLE_PATH = @TOPDIR@/examples
EXAMPLE_PATTERNS = *.c
EXAMPLE_RECURSIVE = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
+doc_api_build_dir = meson.current_build_dir()
doxygen = find_program('doxygen', required: get_option('enable_docs'))
if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
# set up common Doxygen configuration
cdata = configuration_data()
cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
cdata.set('WARN_AS_ERROR', 'NO')
if get_option('werror')
cdata.set('WARN_AS_ERROR', 'YES')
endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
# configure HTML Doxygen run
html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
from sphinx import __version__ as sphinx_version
from os import listdir
from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
import configparser
@@ -24,6 +23,37 @@
file=stderr)
pass
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+ 'members': True,
+ 'member-order': 'bysource',
+ 'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+ path.append(dts_root)
+ # DTS Sidebar config
+ html_theme_options = {
+ 'collapse_navigation': False,
+ 'navigation_depth': -1,
+ }
+
stop_on_error = ('-W' in argv)
project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
html_show_copyright = False
highlight_language = 'none'
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
master_doc = 'index'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Intel Corporation
+doc_guides_source_dir = meson.current_source_dir()
sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd42025507 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
when some of the basics are not met.
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
The style must conform to the `Google style
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+ poetry install --no-root --with docs
+ poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+ ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+ Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+ the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
Configuration Schema
--------------------
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+ subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+ extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+ output: 'html',
+ command: [sphinx_wrapper, sphinx, meson.project_version(),
+ meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+ build_by_default: false,
+ install: get_option('enable_docs'),
+ install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+ message = 'No docs targets found'
+else
+ message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+ depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
# build docs
subdir('doc')
+subdir('dts')
# build any examples explicitly requested - useful for developers - and
# install any example code into the appropriate install path
--
2.34.1
^ permalink raw reply [relevance 2%]
* [PATCH v5 4/4] dts: add API doc generation
@ 2024-06-24 13:26 2% ` Juraj Linkeš
0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 13:26 UTC (permalink / raw)
To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Cc: dev, Juraj Linkeš, Luca Vizzarro
The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.
Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.
There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
code.
* Also the same Python packages as DTS, for the same reason.
[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
doc/api/doxy-api-index.md | 3 +++
doc/api/doxy-api.conf.in | 2 ++
doc/api/meson.build | 11 +++++++---
doc/guides/conf.py | 39 ++++++++++++++++++++++++++++-----
doc/guides/meson.build | 1 +
doc/guides/tools/dts.rst | 34 +++++++++++++++++++++++++++-
dts/doc/meson.build | 27 +++++++++++++++++++++++
dts/meson.build | 16 ++++++++++++++
meson.build | 1 +
10 files changed, 147 insertions(+), 18 deletions(-)
create mode 100644 dts/doc/meson.build
create mode 100644 dts/meson.build
diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
# Copyright(c) 2019 Intel Corporation
#
+import argparse
import sys
import os
from os.path import join
from subprocess import run
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
# set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+ os.environ['DTS_ROOT'] = args.dts_root
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
# find all the files sphinx will process so we can write them as dependencies
srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
srcfiles.extend([join(root, f) for f in files])
+if not os.path.exists(args.dst):
+ os.makedirs(args.dst)
+
# run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
- process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
- stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+ process = run(
+ sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+ stdout=out
+ )
# create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
d.write('html: ' + ' '.join(srcfiles) + '\n')
sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
[experimental APIs](@ref rte_compat.h),
[ABI versioning](@ref rte_function_versioning.h),
[version](@ref rte_version.h)
+
+- **tests**:
+ [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE = YES
SORT_MEMBER_DOCS = NO
SOURCE_BROWSER = YES
+ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
EXAMPLE_PATH = @TOPDIR@/examples
EXAMPLE_PATTERNS = *.c
EXAMPLE_RECURSIVE = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
+doc_api_build_dir = meson.current_build_dir()
doxygen = find_program('doxygen', required: get_option('enable_docs'))
if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
# set up common Doxygen configuration
cdata = configuration_data()
cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
cdata.set('WARN_AS_ERROR', 'NO')
if get_option('werror')
cdata.set('WARN_AS_ERROR', 'YES')
endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
# configure HTML Doxygen run
html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
from sphinx import __version__ as sphinx_version
from os import listdir
from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
import configparser
@@ -24,6 +23,37 @@
file=stderr)
pass
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+ 'members': True,
+ 'member-order': 'bysource',
+ 'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+ path.append(dts_root)
+ # DTS Sidebar config
+ html_theme_options = {
+ 'collapse_navigation': False,
+ 'navigation_depth': -1,
+ }
+
stop_on_error = ('-W' in argv)
project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
html_show_copyright = False
highlight_language = 'none'
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
master_doc = 'index'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Intel Corporation
+doc_guides_source_dir = meson.current_source_dir()
sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd42025507 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
when some of the basics are not met.
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
The style must conform to the `Google style
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+ poetry install --no-root --with docs
+ poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+ ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+ Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+ the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
Configuration Schema
--------------------
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+ subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+ extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+ output: 'html',
+ command: [sphinx_wrapper, sphinx, meson.project_version(),
+ meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+ build_by_default: false,
+ install: get_option('enable_docs'),
+ install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+ message = 'No docs targets found'
+else
+ message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+ depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
# build docs
subdir('doc')
+subdir('dts')
# build any examples explicitly requested - useful for developers - and
# install any example code into the appropriate install path
--
2.34.1
^ permalink raw reply [relevance 2%]
* [DPDK/core Bug 1471] rte_pktmbuf_free_bulk does not respect RTE_LIBRTE_MBUF_DEBUG
@ 2024-06-21 18:33 3% bugzilla
0 siblings, 0 replies; 200+ results
From: bugzilla @ 2024-06-21 18:33 UTC (permalink / raw)
To: dev
[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]
https://bugs.dpdk.org/show_bug.cgi?id=1471
Bug ID: 1471
Summary: rte_pktmbuf_free_bulk does not respect
RTE_LIBRTE_MBUF_DEBUG
Product: DPDK
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: core
Assignee: dev@dpdk.org
Reporter: mb@smartsharesystems.com
Target Milestone: ---
rte_pktmbuf_free_bulk() calls __rte_mbuf_sanity_check(), which behaves
differently depending on RTE_LIBRTE_MBUF_DEBUG being defined or not.
Unfortunately, rte_pktmbuf_free_bulk() is not inline, but in the C file.
This means that the behavior of __rte_mbuf_sanity_check() within
rte_pktmbuf_free_bulk() is controlled by the RTE_LIBRTE_MBUF_DEBUG setting when
building the DPDK library, not when building the application.
rte_pktmbuf_free_bulk() should have been inline in the header file, so the
application developer can control its __rte_mbuf_sanity_check() behavior by
using RTE_LIBRTE_MBUF_DEBUG setting when building the application.
How to remove this function from the ABI and make it inline instead?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #2: Type: text/html, Size: 3174 bytes --]
^ permalink raw reply [relevance 3%]
* [PATCH v2 3/9] doc: reword design section in contributors guidelines
@ 2024-06-21 2:32 6% ` Nandini Persad
0 siblings, 0 replies; 200+ results
From: Nandini Persad @ 2024-06-21 2:32 UTC (permalink / raw)
To: dev
Minor editing was made for grammar and syntax of design section.
Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
---
.mailmap | 1 +
doc/guides/contributing/design.rst | 86 +++++++++++++++---------------
doc/guides/linux_gsg/sys_reqs.rst | 2 +-
3 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/.mailmap b/.mailmap
index 66ebc20666..7d4929c5d1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1002,6 +1002,7 @@ Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
Nalla Pradeep <pnalla@marvell.com>
Na Na <nana.nn@alibaba-inc.com>
Nan Chen <whutchennan@gmail.com>
+Nandini Persad <nandinipersad361@gmail.com>
Nannan Lu <nannan.lu@intel.com>
Nan Zhou <zhounan14@huawei.com>
Narcisa Vasile <navasile@linux.microsoft.com> <navasile@microsoft.com> <narcisa.vasile@microsoft.com>
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index b724177ba1..3d1f5aeb91 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -1,6 +1,7 @@
.. SPDX-License-Identifier: BSD-3-Clause
Copyright 2018 The DPDK contributors
+
Design
======
@@ -8,22 +9,26 @@ Design
Environment or Architecture-specific Sources
--------------------------------------------
-In DPDK and DPDK applications, some code is specific to an architecture (i686, x86_64) or to an executive environment (freebsd or linux) and so on.
-As far as is possible, all such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+In DPDK and DPDK applications, some code is architecture-specific (i686, x86_64) or environment-specific (FreeBsd or Linux, etc.).
+When feasible, such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+
+By convention, a file is specific if the directory is indicated. Otherwise, it is common.
-By convention, a file is common if it is not located in a directory indicating that it is specific.
-For instance, a file located in a subdir of "x86_64" directory is specific to this architecture.
+For example:
+
+A file located in a subdir of "x86_64" directory is specific to this architecture.
A file located in a subdir of "linux" is specific to this execution environment.
.. note::
Code in DPDK libraries and applications should be generic.
- The correct location for architecture or executive environment specific code is in the EAL.
+ The correct location for architecture or executive environment-specific code is in the EAL.
+
+When necessary, there are several ways to handle specific code:
-When absolutely necessary, there are several ways to handle specific code:
-* Use a ``#ifdef`` with a build definition macro in the C code.
- This can be done when the differences are small and they can be embedded in the same C file:
+* When the differences are small and they can be embedded in the same C file, use a ``#ifdef`` with a build definition macro in the C code.
+
.. code-block:: c
@@ -33,9 +38,9 @@ When absolutely necessary, there are several ways to handle specific code:
titi();
#endif
-* Use build definition macros and conditions in the Meson build file. This is done when the differences are more significant.
- In this case, the code is split into two separate files that are architecture or environment specific.
- This should only apply inside the EAL library.
+
+* When the differences are more significant, use build definition macros and conditions in the Meson build file. In this case, the code is split into two separate files that are architecture or environment specific. This should only apply inside the EAL library.
+
Per Architecture Sources
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +48,8 @@ Per Architecture Sources
The following macro options can be used:
* ``RTE_ARCH`` is a string that contains the name of the architecture.
-* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined only if we are building for those architectures.
+* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined when building for these architectures.
+
Per Execution Environment Sources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -51,30 +57,22 @@ Per Execution Environment Sources
The following macro options can be used:
* ``RTE_EXEC_ENV`` is a string that contains the name of the executive environment.
-* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only if we are building for this execution environment.
+* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only when building for this execution environment.
+
Mbuf features
-------------
-The ``rte_mbuf`` structure must be kept small (128 bytes).
-
-In order to add new features without wasting buffer space for unused features,
-some fields and flags can be registered dynamically in a shared area.
-The "dynamic" mbuf area is the default choice for the new features.
+A designated area in mbuf stores "dynamically" registered fields and flags. It is the default choice for accommodating new features. The "dynamic" area consumes the remaining space in the mbuf, indicating that it's being efficiently utilized. However, the ``rte_mbuf`` structure must be kept small (128 bytes).
-The "dynamic" area is eating the remaining space in mbuf,
-and some existing "static" fields may need to become "dynamic".
-
-Adding a new static field or flag must be an exception matching many criteria
-like (non exhaustive): wide usage, performance, size.
+As more features are added, the space for existinG=g "static" fields (fields that are allocated statically) may need to be reconsidered and possibly converted to "dynamic" allocation. Adding a new static field or flag should be an exception. It must meet specific criteria including widespread usage, performance impact, and size considerations. Before adding a new static feature, it must be justified by its necessity and its impact on the system's efficiency.
Runtime Information - Logging, Tracing and Telemetry
----------------------------------------------------
-It is often desirable to provide information to the end-user
-as to what is happening to the application at runtime.
-DPDK provides a number of built-in mechanisms to provide this introspection:
+The end user may inquire as to what is happening to the application at runtime.
+DPDK provides several built-in mechanisms to provide these insights:
* :ref:`Logging <dynamic_logging>`
* :doc:`Tracing <../prog_guide/trace_lib>`
@@ -82,11 +80,11 @@ DPDK provides a number of built-in mechanisms to provide this introspection:
Each of these has its own strengths and suitabilities for use within DPDK components.
-Below are some guidelines for when each should be used:
+Here are guidelines for when each mechanism should be used:
* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
- Depending on the severity of the issue, the appropriate log level, for example,
- ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+ For example, depending on the severity of the issue, the appropriate log level,
+ ``ERROR``, ``WARNING`` or ``NOTICE`` should be used.
.. note::
@@ -96,24 +94,24 @@ Below are some guidelines for when each should be used:
* For component initialization, or other cases where a path through the code
is only likely to be taken once,
- either *logging* at ``DEBUG`` level or *tracing* may be used, or potentially both.
+ either *logging* at ``DEBUG`` level or *tracing* may be used, or both.
In the latter case, tracing can provide basic information as to the code path taken,
with debug-level logging providing additional details on internal state,
- not possible to emit via tracing.
+ which is not possible to emit via tracing.
* For a component's data-path, where a path is to be taken multiple times within a short timeframe,
*tracing* should be used.
Since DPDK tracing uses `Common Trace Format <https://diamon.org/ctf/>`_ for its tracing logs,
post-analysis can be done using a range of external tools.
-* For numerical or statistical data generated by a component, for example, per-packet statistics,
+* For numerical or statistical data generated by a component, such as per-packet statistics,
*telemetry* should be used.
-* For any data where the data may need to be gathered at any point in the execution
- to help assess the state of the application component,
- for example, core configuration, device information, *telemetry* should be used.
+* For any data that may need to be gathered at any point during the execution
+ to help assess the state of the application component (for example, core configuration, device information) *telemetry* should be used.
Telemetry callbacks should not modify any program state, but be "read-only".
+
Many libraries also include a ``rte_<libname>_dump()`` function as part of their API,
writing verbose internal details to a given file-handle.
New libraries are encouraged to provide such functions where it makes sense to do so,
@@ -135,13 +133,12 @@ requirements for preventing ABI changes when implementing statistics.
Mechanism to allow the application to turn library statistics on and off
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Having runtime support for enabling/disabling library statistics is recommended,
-as build-time options should be avoided. However, if build-time options are used,
-for example as in the table library, the options can be set using c_args.
-When this flag is set, all the counters supported by current library are
+Having runtime support for enabling/disabling library statistics is recommended
+as build-time options should be avoided. However, if build-time options are used, as in the table library, the options can be set using c_args.
+When this flag is set, all the counters supported by the current library are
collected for all the instances of every object type provided by the library.
When this flag is cleared, none of the counters supported by the current library
-are collected for any instance of any object type provided by the library:
+are collected for any instance of any object type provided by the library.
Prevention of ABI changes due to library statistics support
@@ -165,8 +162,8 @@ Motivation to allow the application to turn library statistics on and off
It is highly recommended that each library provides statistics counters to allow
an application to monitor the library-level run-time events. Typical counters
-are: number of packets received/dropped/transmitted, number of buffers
-allocated/freed, number of occurrences for specific events, etc.
+are: the number of packets received/dropped/transmitted, the number of buffers
+allocated/freed, the number of occurrences for specific events, etc.
However, the resources consumed for library-level statistics counter collection
have to be spent out of the application budget and the counters collected by
@@ -198,6 +195,7 @@ applications:
the application may decide to turn the collection of statistics counters off for
Library X and on for Library Y.
+
The statistics collection consumes a certain amount of CPU resources (cycles,
cache bandwidth, memory bandwidth, etc) that depends on:
@@ -218,6 +216,7 @@ cache bandwidth, memory bandwidth, etc) that depends on:
validated for header integrity, counting the number of bits set in a bitmask
might be needed.
+
PF and VF Considerations
------------------------
@@ -229,5 +228,6 @@ Developers should work with the Linux Kernel community to get the required
functionality upstream. PF functionality should only be added to DPDK for
testing and prototyping purposes while the kernel work is ongoing. It should
also be marked with an "EXPERIMENTAL" tag. If the functionality isn't
-upstreamable then a case can be made to maintain the PF functionality in DPDK
+upstreamable, then a case can be made to maintain the PF functionality in DPDK
without the EXPERIMENTAL tag.
+
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 13be715933..0569c5cae6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -99,7 +99,7 @@ e.g. :doc:`../nics/index`
Running DPDK Applications
-------------------------
-To run a DPDK application, some customization may be required on the target machine.
+To run a DPDK application, customization may be required on the target machine.
System Software
~~~~~~~~~~~~~~~
--
2.34.1
^ permalink raw reply [relevance 6%]
* Re: [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
2024-06-19 15:32 0% ` Thomas Monjalon
@ 2024-06-20 2:32 0% ` lihuisong (C)
0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2024-06-20 2:32 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, mb, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, david.marchand, liuyonglong
在 2024/6/19 23:32, Thomas Monjalon 写道:
> 19/06/2024 08:31, Huisong Li:
>> --- /dev/null
>> +++ b/lib/power/rte_power_qos.h
>> @@ -0,0 +1,71 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2024 HiSilicon Limited
>> + */
>> +
>> +#ifndef RTE_POWER_QOS_H
>> +#define RTE_POWER_QOS_H
>> +
>> +#include <rte_compat.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * @file rte_power_qos.h
>> + *
>> + * PM QoS API.
>> + *
>> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle
>> + * state selection in each cpuidle governor.
>> + * Please see the PM QoS on CPU wide in the following link:
>> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
>> + *
>> + * The deeper the idle state, the lower the power consumption, but the
>> + * longer the resume time. Some service are delay sensitive and very except the
>> + * low resume time, like interrupt packet receiving mode.
>> + *
>> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle
>> + * state selection and limit just enter the shallowest idle state to low the
>> + * delay after sleep by setting strict resume latency (zero value).
>> + */
>> +
>> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
>> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
> stdint.h include is missing
Yes, it desn't satisfy self-contained header files.
will add it in next version, thanks.
>
>
>
> .
^ permalink raw reply [relevance 0%]
* Re: [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
2024-06-19 6:31 5% ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
@ 2024-06-19 15:32 0% ` Thomas Monjalon
2024-06-20 2:32 0% ` lihuisong (C)
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-06-19 15:32 UTC (permalink / raw)
To: Huisong Li
Cc: dev, mb, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, david.marchand, liuyonglong, lihuisong
19/06/2024 08:31, Huisong Li:
> --- /dev/null
> +++ b/lib/power/rte_power_qos.h
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#ifndef RTE_POWER_QOS_H
> +#define RTE_POWER_QOS_H
> +
> +#include <rte_compat.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file rte_power_qos.h
> + *
> + * PM QoS API.
> + *
> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle
> + * state selection in each cpuidle governor.
> + * Please see the PM QoS on CPU wide in the following link:
> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
> + *
> + * The deeper the idle state, the lower the power consumption, but the
> + * longer the resume time. Some service are delay sensitive and very except the
> + * low resume time, like interrupt packet receiving mode.
> + *
> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle
> + * state selection and limit just enter the shallowest idle state to low the
> + * delay after sleep by setting strict resume latency (zero value).
> + */
> +
> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
stdint.h include is missing
^ permalink raw reply [relevance 0%]
* [PATCH v12 2/4] mbuf: remove marker fields
2024-06-19 15:01 3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
@ 2024-06-19 15:01 6% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-19 15:01 UTC (permalink / raw)
To: dev; +Cc: roretzla, Morten Brørup, Stephen Hemminger, Thomas Monjalon
From: Tyler Retzlaff <roretzla@linux.microsoft.com>
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.
Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
Changes since v11:
- moved libabigail suppression,
- moved RN update to API change,
- updated one comment in rte_mbuf_core.h referring to cacheline0,
- removed (unrelated) doxygen updates,
---
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 3 +
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 188 +++++++++++++------------
4 files changed, 109 insertions(+), 92 deletions(-)
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 32a2ea309e..96b16059a8 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,12 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Temporary exceptions till next major ABI version ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+ name = rte_mbuf
+ type_kind = struct
+ has_size_change = no
+ has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
+
[suppress_type]
name = rte_pipeline_table_entry
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index ccd0f8e598..7c88de381b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -178,6 +178,9 @@ API Changes
Also, make sure to start the actual text at the margin.
=======================================================
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+ have been removed from ``struct rte_mbuf``.
+
ABI Changes
-----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b788..4c4722e002 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@ int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
static inline void
rte_mbuf_prefetch_part1(struct rte_mbuf *m)
{
- rte_prefetch0(&m->cacheline0);
+ rte_prefetch0(m);
}
/**
@@ -126,7 +126,7 @@ static inline void
rte_mbuf_prefetch_part2(struct rte_mbuf *m)
{
#if RTE_CACHE_LINE_SIZE == 64
- rte_prefetch0(&m->cacheline1);
+ rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
#else
RTE_SET_USED(m);
#endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f580769cf..a0df265b5d 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
* The generic rte_mbuf, containing a packet mbuf.
*/
struct __rte_cache_aligned rte_mbuf {
- RTE_MARKER cacheline0;
-
void *buf_addr; /**< Virtual address of segment buffer. */
#if RTE_IOVA_IN_MBUF
/**
@@ -474,7 +472,7 @@ struct __rte_cache_aligned rte_mbuf {
* This field is undefined if the build is configured to use only
* virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
* Force alignment to 8-bytes, so as to ensure we have the exact
- * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+ * layout for the first cache line for 32-bit and 64-bit. This makes
* working on vector drivers easier.
*/
alignas(sizeof(rte_iova_t)) rte_iova_t buf_iova;
@@ -488,127 +486,137 @@ struct __rte_cache_aligned rte_mbuf {
#endif
/* next 8 bytes are initialised on RX descriptor rearm */
- RTE_MARKER64 rearm_data;
- uint16_t data_off;
-
- /**
- * Reference counter. Its size should at least equal to the size
- * of port field (16 bits), to support zero-copy broadcast.
- * It should only be accessed using the following functions:
- * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
- * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
- * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
- */
- RTE_ATOMIC(uint16_t) refcnt;
+ union {
+ uint64_t rearm_data[1];
+ __extension__
+ struct {
+ uint16_t data_off;
+
+ /**
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
+ * It should only be accessed using the following functions:
+ * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+ * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+ * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+ */
+ RTE_ATOMIC(uint16_t) refcnt;
- /**
- * Number of segments. Only valid for the first segment of an mbuf
- * chain.
- */
- uint16_t nb_segs;
+ /**
+ * Number of segments. Only valid for the first segment of an mbuf
+ * chain.
+ */
+ uint16_t nb_segs;
- /** Input port (16 bits to support more than 256 virtual ports).
- * The event eth Tx adapter uses this field to specify the output port.
- */
- uint16_t port;
+ /** Input port (16 bits to support more than 256 virtual ports).
+ * The event eth Tx adapter uses this field to specify the output port.
+ */
+ uint16_t port;
+ };
+ };
uint64_t ol_flags; /**< Offload features. */
- /* remaining bytes are set on RX when pulling packet from descriptor */
- RTE_MARKER rx_descriptor_fields1;
-
- /*
- * The packet type, which is the combination of outer/inner L2, L3, L4
- * and tunnel types. The packet_type is about data really present in the
- * mbuf. Example: if vlan stripping is enabled, a received vlan packet
- * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
- * vlan is stripped from the data.
- */
+ /* remaining 24 bytes are set on RX when pulling packet from descriptor */
union {
- uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+ /* void * type of the array elements is retained for driver compatibility. */
+ void *rx_descriptor_fields1[24 / sizeof(void *)];
__extension__
struct {
- uint8_t l2_type:4; /**< (Outer) L2 type. */
- uint8_t l3_type:4; /**< (Outer) L3 type. */
- uint8_t l4_type:4; /**< (Outer) L4 type. */
- uint8_t tun_type:4; /**< Tunnel type. */
+ /*
+ * The packet type, which is the combination of outer/inner L2, L3, L4
+ * and tunnel types. The packet_type is about data really present in the
+ * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+ * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+ * vlan is stripped from the data.
+ */
union {
- uint8_t inner_esp_next_proto;
- /**< ESP next protocol type, valid if
- * RTE_PTYPE_TUNNEL_ESP tunnel type is set
- * on both Tx and Rx.
- */
+ uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
__extension__
struct {
- uint8_t inner_l2_type:4;
- /**< Inner L2 type. */
- uint8_t inner_l3_type:4;
- /**< Inner L3 type. */
+ uint8_t l2_type:4; /**< (Outer) L2 type. */
+ uint8_t l3_type:4; /**< (Outer) L3 type. */
+ uint8_t l4_type:4; /**< (Outer) L4 type. */
+ uint8_t tun_type:4; /**< Tunnel type. */
+ union {
+ uint8_t inner_esp_next_proto;
+ /**< ESP next protocol type, valid if
+ * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+ * on both Tx and Rx.
+ */
+ __extension__
+ struct {
+ uint8_t inner_l2_type:4;
+ /**< Inner L2 type. */
+ uint8_t inner_l3_type:4;
+ /**< Inner L3 type. */
+ };
+ };
+ uint8_t inner_l4_type:4; /**< Inner L4 type. */
};
};
- uint8_t inner_l4_type:4; /**< Inner L4 type. */
- };
- };
- uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
- uint16_t vlan_tci;
+ uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
+ uint16_t data_len; /**< Amount of data in segment buffer. */
+ /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+ uint16_t vlan_tci;
- union {
- union {
- uint32_t rss; /**< RSS hash result if RSS enabled */
- struct {
+ union {
union {
+ uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
- uint16_t hash;
- uint16_t id;
- };
- uint32_t lo;
- /**< Second 4 flexible bytes */
- };
- uint32_t hi;
- /**< First 4 flexible bytes or FD ID, dependent
- * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
- */
- } fdir; /**< Filter identifier if FDIR enabled */
- struct rte_mbuf_sched sched;
- /**< Hierarchical scheduler : 8 bytes */
- struct {
- uint32_t reserved1;
- uint16_t reserved2;
- uint16_t txq;
- /**< The event eth Tx adapter uses this field
- * to store Tx queue id.
- * @see rte_event_eth_tx_adapter_txq_set()
- */
- } txadapter; /**< Eventdev ethdev Tx adapter */
- uint32_t usr;
- /**< User defined tags. See rte_distributor_process() */
- } hash; /**< hash information */
- };
+ union {
+ struct {
+ uint16_t hash;
+ uint16_t id;
+ };
+ uint32_t lo;
+ /**< Second 4 flexible bytes */
+ };
+ uint32_t hi;
+ /**< First 4 flexible bytes or FD ID, dependent
+ * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+ */
+ } fdir; /**< Filter identifier if FDIR enabled */
+ struct rte_mbuf_sched sched;
+ /**< Hierarchical scheduler : 8 bytes */
+ struct {
+ uint32_t reserved1;
+ uint16_t reserved2;
+ uint16_t txq;
+ /**< The event eth Tx adapter uses this field
+ * to store Tx queue id.
+ * @see rte_event_eth_tx_adapter_txq_set()
+ */
+ } txadapter; /**< Eventdev ethdev Tx adapter */
+ uint32_t usr;
+ /**< User defined tags. See rte_distributor_process() */
+ } hash; /**< hash information */
+ };
- /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
- uint16_t vlan_tci_outer;
+ /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+ uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
+ uint16_t buf_len; /**< Length of segment buffer. */
+ };
+ };
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
/* second cache line - fields only used in slow path or on TX */
- alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
#if RTE_IOVA_IN_MBUF
/**
* Next segment of scattered packet. Must be NULL in the last
* segment or in case of non-segmented packet.
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
struct rte_mbuf *next;
#else
/**
* Reserved for dynamic fields
* when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
uint64_t dynfield2;
#endif
--
2.45.1
^ permalink raw reply [relevance 6%]
* [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries
` (3 preceding siblings ...)
2024-04-04 17:51 3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-06-19 15:01 3% ` David Marchand
2024-06-19 15:01 6% ` [PATCH v12 2/4] mbuf: remove marker fields David Marchand
4 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-06-19 15:01 UTC (permalink / raw)
To: dev; +Cc: roretzla
As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.
For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
Note: diff is easier viewed with -b due to additional nesting from
unions / structs that have been introduced.
v12:
* rebased,
* did some cosmetic changes,
* and resending to double check CI (v11 had an issue in UNH Debian
containers),
v11:
* correct doxygen comment style for field documentation.
v10:
* move removal notices in in release notes from 24.03 to 24.07.
v9:
* provide narrowest possible libabigail.abignore to suppress
removal of fields that were agreed are not actual abi changes.
v8:
* rx_descriptor_fields1 array is now constexpr sized to
24 / sizeof(void *) so that the array encompasses fields
accessed via the array.
* add a comment to rx_descriptor_fields1 array site noting
that void * type of elements is retained for compatibility
with existing drivers.
* clean up comments of fields in rte_mbuf to be before the
field they apply to instead of after.
* duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
conditional compile for first field of cacheline 1 instead of
once before conditional compile block.
v7:
* complete re-write of series, previous versions not noted. all
reviewed-by and acked-by tags (if any) were removed.
--
David Marchand
Tyler Retzlaff (4):
net/i40e: use mbuf prefetch helper
mbuf: remove marker fields
security: remove marker fields
cryptodev: remove marker fields
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 3 +
drivers/net/i40e/i40e_rxtx_vec_avx512.c | 2 +-
lib/cryptodev/cryptodev_pmd.h | 3 +-
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 188 ++++++++++++------------
lib/security/rte_security_driver.h | 3 +-
7 files changed, 112 insertions(+), 97 deletions(-)
--
2.45.1
^ permalink raw reply [relevance 3%]
* RE: [PATCH v3 0/2] power: introduce PM QoS interface
2024-06-19 6:31 4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
2024-06-19 6:31 5% ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
@ 2024-06-19 6:59 0% ` Morten Brørup
1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-06-19 6:59 UTC (permalink / raw)
To: Huisong Li, dev
Cc: thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, david.marchand, liuyonglong
> From: Huisong Li [mailto:lihuisong@huawei.com]
> Sent: Wednesday, 19 June 2024 08.32
>
> The deeper the idle state, the lower the power consumption, but the longer
> the resume time. Some service are delay sensitive and very except the low
> resume time, like interrupt packet receiving mode.
>
> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> interface is used to set and get the resume latency limit on the cpuX for
> userspace. Please see the description in kernel document[1].
> Each cpuidle governor in Linux select which idle state to enter based on
> this CPU resume latency in their idle task.
>
> The per-CPU PM QoS API can be used to control this CPU's idle state
> selection and limit just enter the shallowest idle state to low the delay
> after sleep by setting strict resume latency (zero value).
>
> [1] https://www.kernel.org/doc/html/latest/admin-guide/abi-
> testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-
> resume-latency-us
>
> ---
> v3:
> - add RTE_POWER_xxx prefix for some macro in header
> - add the check for lcore_id with rte_lcore_is_enabled
> v2:
> - use PM QoS on CPU wide to replace the one on system wide
Series-acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [relevance 0%]
* [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
2024-06-19 6:31 4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-06-19 6:31 5% ` Huisong Li
2024-06-19 15:32 0% ` Thomas Monjalon
2024-06-19 6:59 0% ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
1 sibling, 1 reply; 200+ results
From: Huisong Li @ 2024-06-19 6:31 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, david.marchand, liuyonglong, lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
doc/guides/prog_guide/power_man.rst | 22 +++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 71 +++++++++++++++
lib/power/version.map | 2 +
6 files changed, 215 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..3ff46f06c1 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,28 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can effect the work
+CPU's idle state selection and just allow to enter the shallowest idle state
+if set to zero (strict resume latency) for this CPU.
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can obtain the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..7c0d36e389 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
* Added SSE/NEON vector datapath.
+* **Introduce PM QoS interface.**
+
+ * Introduce PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..b131cf58e7
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[BUFSIZ] = {0};
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ sprintf(buf, "%s", "n/a");
+ else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ sprintf(buf, "%u", 0);
+ else
+ sprintf(buf, "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[BUFSIZ];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (!rte_lcore_is_enabled(lcore_id)) {
+ POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..2b25d0d4c1
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v3 0/2] power: introduce PM QoS interface
2024-06-13 11:20 4% ` [PATCH v2 0/2] power: " Huisong Li
@ 2024-06-19 6:31 4% ` Huisong Li
2024-06-19 6:31 5% ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19 6:59 0% ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
2024-06-27 6:00 4% ` [PATCH v4 " Huisong Li
` (4 subsequent siblings)
6 siblings, 2 replies; 200+ results
From: Huisong Li @ 2024-06-19 6:31 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, david.marchand, liuyonglong, lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
---
v3:
- add RTE_POWER_xxx prefix for some macro in header
- add the check for lcore_id with rte_lcore_is_enabled
v2:
- use PM QoS on CPU wide to replace the one on system wide
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 22 +++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 29 +++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 114 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 71 +++++++++++++++
lib/power/version.map | 2 +
7 files changed, 244 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* Re: [PATCH v5] graph: expose node context as pointers
2024-03-27 9:14 4% ` [PATCH v5] " Robin Jarry
2024-05-29 17:54 0% ` Nithin Dabilpuram
@ 2024-06-18 12:33 4% ` David Marchand
2024-06-25 15:22 0% ` Robin Jarry
1 sibling, 1 reply; 200+ results
From: David Marchand @ 2024-06-18 12:33 UTC (permalink / raw)
To: Robin Jarry
Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
Tyler Retzlaff
Re Robin,
On Wed, Mar 27, 2024 at 10:17 AM Robin Jarry <rjarry@redhat.com> wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. And without
> intermediate structures, casting to opaque pointers is hard without
> violating strict aliasing rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
> preserve existing alignment on architectures where cache lines are 128
> bytes.
>
> Add a static assert to ensure that the unnamed union is not larger than
> the context array (RTE_NODE_CTX_SZ).
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
>
> Notes:
> v5:
>
> * Helper functions to hide casting proved to be harder than expected.
> Naive casting may even be impossible without breaking strict aliasing
> rules. The only other option would be to use explicit memcpy calls.
> * Unnamed union tentative again. As suggested by Tyler (thank you!),
> using an intermediate unnamed struct to carry the alignment produces
> consistent ABI in C and C++.
> * Also, Tyler (thank you!) suggested that the fast path area alignment
> size may be incorrect for architectures where the cache line is not 64
> bytes. There will be a 64 bytes hole in the structure at the end of
> the unnamed struct before the zero length next nodes array. Use
> __rte_cache_min_aligned to preserve existing alignment.
- There is still an issue with that approach on 128 bytes cache line
arches, like ARM.
This results in a ABI breakage:
Functions changes summary: 0 Removed, 1 Changed (9 filtered out), 0
Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
1 function with some indirect sub-type change:
[C] 'function bool
__rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
some indirect sub-type changes:
parameter 1 of type 'rte_node*' has sub-type changes:
in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
type size changed from 3072 to 2048 (in bits)
7 data member deletions:
'uint8_t ctx[16]', at offset 2048 (in bits) at
rte_graph_worker_common.h:115:1
'uint16_t size', at offset 2176 (in bits) at
rte_graph_worker_common.h:116:1
'uint16_t idx', at offset 2192 (in bits) at
rte_graph_worker_common.h:117:1
'rte_graph_off_t off', at offset 2208 (in bits) at
rte_graph_worker_common.h:118:1
'uint64_t total_cycles', at offset 2240 (in bits) at
rte_graph_worker_common.h:119:1
'uint64_t total_calls', at offset 2304 (in bits) at
rte_graph_worker_common.h:120:1
'uint64_t total_objs', at offset 2368 (in bits) at
rte_graph_worker_common.h:121:1
1 data member insertion:
'struct {union {uint8_t ctx[16]; struct {void* ctx_ptr;
void* ctx_ptr2;};}; uint16_t size; uint16_t idx; rte_graph_off_t off;
uint64_t total_cycles; uint64_t total_calls; uint64_t total_objs;
union {void** objs; uint64_t objs_u64;}; union {rte_node_process_t
process; uint64_t process_u64;};}', at offset 1536 (in bits)
1 data member changes (1 filtered):
'rte_node* nodes[]' offset changed from 2560 to 2048 (in
bits) (by -512 bits)
Before the patch, the rte_node object layout was:
struct rte_node {
...
/* XXX 64 bytes hole, try to pack */
/* --- cacheline 4 boundary (256 bytes) --- */
uint8_t ctx[16]
__attribute__((__aligned__(128))); /* 256 16 */
uint16_t size; /* 272 2 */
uint16_t idx; /* 274 2 */
rte_graph_off_t off; /* 276 4 */
uint64_t total_cycles; /* 280 8 */
uint64_t total_calls; /* 288 8 */
uint64_t total_objs; /* 296 8 */
union {
void * * objs; /* 304 8 */
uint64_t objs_u64; /* 304 8 */
}; /* 304 8 */
union {
rte_node_process_t process; /* 312 8 */
uint64_t process_u64; /* 312 8 */
}; /* 312 8 */
/* --- cacheline 5 boundary (320 bytes) --- */
struct rte_node * nodes[]
__attribute__((__aligned__(64))); /* 320 0 */
/* size: 384, cachelines: 6, members: 20 */
/* sum members: 250, holes: 3, sum holes: 70 */
/* padding: 64 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 64 */
} __attribute__((__aligned__(128)));
After this patch:
struct rte_node {
...
/* --- cacheline 3 boundary (192 bytes) --- */
struct {
union {
uint8_t ctx[16]; /* 192 16 */
struct {
void * ctx_ptr; /* 192 8 */
void * ctx_ptr2; /* 200 8 */
}; /* 192 16 */
}; /* 192 16 */
uint16_t size; /* 208 2 */
uint16_t idx; /* 210 2 */
rte_graph_off_t off; /* 212 4 */
uint64_t total_cycles; /* 216 8 */
uint64_t total_calls; /* 224 8 */
uint64_t total_objs; /* 232 8 */
union {
void * * objs; /* 240 8 */
uint64_t objs_u64; /* 240 8 */
}; /* 240 8 */
union {
rte_node_process_t process; /* 248 8 */
uint64_t process_u64; /* 248 8 */
}; /* 248 8 */
} __attribute__((__aligned__(64)))
__attribute__((__aligned__(64))); /* 192 64 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct rte_node * nodes[]
__attribute__((__aligned__(64))); /* 256 0 */
/* size: 256, cachelines: 4, members: 12 */
/* sum members: 250, holes: 2, sum holes: 6 */
/* forced alignments: 2 */
} __attribute__((__aligned__(128)));
The introduced anonymous structure gets aligned on the minimum cache
line size (64 bytes): with this change, ctx[] move from offset 256, to
offset 192.
Similarly, nodes[] moves from offset 320 to offset 256.
As we discussed offlist, there are a few options to workaround this
issue (like moving nodes[] inside the anonymous struct though it still
results in an increased rte_node struct, or like adding an explicit
padding field right before the newly introduced anonymous struct,
...).
- Additionally, anonymous structures are not correctly handled with
libabigail 2.4 which is the version used in the CI.
At the moment, the ABI check in GHA and UNH will fail on x86 with:
1 function with some indirect sub-type change:
[C] 'function bool
__rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
some indirect sub-type changes:
parameter 1 of type 'rte_node*' has sub-type changes:
in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
type size hasn't changed
2 data member deletions:
'union {void** objs; uint64_t objs_u64;}', at offset 1920 (in bits)
'union {rte_node_process_t process; uint64_t process_u64;}',
at offset 1984 (in bits)
no data member changes (2 filtered);
On this topic, we have to either put a suppression rule on the
rte_node structure, or bump the libabigail version in UNH, GHA, and
the maintainers build env (though the latter won't happen overnight,
and we are really close to rc1).
For those two reasons, it is better to revisit this patch and have it
ready for the next release.
While at it, it may be worth cleaning up the rte_node structure in
v24.11, if so, please announce in a deprecation notice for this
planned ABI breakage.
--
David Marchand
^ permalink raw reply [relevance 4%]
* Re: [PATCH v9 4/4] hash: add SVE support for bulk key lookup
@ 2024-06-14 13:42 4% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-14 13:42 UTC (permalink / raw)
To: Yoan Picchi
Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
dev, nd, Harjot Singh, Nathan Brown, Ruifeng Wang
On Tue, Apr 30, 2024 at 6:28 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> - Implemented SVE code for comparing signatures in bulk lookup.
> - New SVE code is ~5% slower than optimized NEON for N2 processor for
> 128b vectors.
>
> Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
> Signed-off-by: Harjot Singh <harjot.singh@arm.com>
> Reviewed-by: Nathan Brown <nathan.brown@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++
> lib/hash/rte_cuckoo_hash.c | 7 +++-
> lib/hash/rte_cuckoo_hash.h | 1 +
> 3 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/compare_signatures.h
> index 72bd171484..b4b4cf04e9 100644
> --- a/lib/hash/arch/arm/compare_signatures.h
> +++ b/lib/hash/arch/arm/compare_signatures.h
> @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
> *hitmask_buffer = vaddvq_u16(hit2);
> }
> break;
> +#endif
> +#if defined(RTE_HAS_SVE_ACLE)
> + case RTE_HASH_COMPARE_SVE: {
> + svuint16_t vsign, shift, sv_matches;
> + svbool_t pred, match, bucket_wide_pred;
> + int i = 0;
> + uint64_t vl = svcnth();
> +
> + vsign = svdup_u16(sig);
> + shift = svindex_u16(0, 1);
> +
> + if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKET_ENTRIES <= 8) {
> + svuint16_t primary_array_vect, secondary_array_vect;
> + bucket_wide_pred = svwhilelt_b16(0, RTE_HASH_BUCKET_ENTRIES);
> + primary_array_vect = svld1_u16(bucket_wide_pred, prim_bucket_sigs);
> + secondary_array_vect = svld1_u16(bucket_wide_pred, sec_bucket_sigs);
> +
> + /* We merged the two vectors so we can do both comparisons at once */
> + primary_array_vect = svsplice_u16(bucket_wide_pred,
> + primary_array_vect,
> + secondary_array_vect);
> + pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
> +
> + /* Compare all signatures in the buckets */
> + match = svcmpeq_u16(pred, vsign, primary_array_vect);
> + if (svptest_any(svptrue_b16(), match)) {
> + sv_matches = svdup_u16(1);
> + sv_matches = svlsl_u16_z(match, sv_matches, shift);
> + *hitmask_buffer = svorv_u16(svptrue_b16(), sv_matches);
> + }
> + } else {
> + do {
> + pred = svwhilelt_b16(i, RTE_HASH_BUCKET_ENTRIES);
> + uint16_t lower_half = 0;
> + uint16_t upper_half = 0;
> + /* Compare all signatures in the primary bucket */
> + match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
> + &prim_bucket_sigs[i]));
> + if (svptest_any(svptrue_b16(), match)) {
> + sv_matches = svdup_u16(1);
> + sv_matches = svlsl_u16_z(match, sv_matches, shift);
> + lower_half = svorv_u16(svptrue_b16(), sv_matches);
> + }
> + /* Compare all signatures in the secondary bucket */
> + match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
> + &sec_bucket_sigs[i]));
> + if (svptest_any(svptrue_b16(), match)) {
> + sv_matches = svdup_u16(1);
> + sv_matches = svlsl_u16_z(match, sv_matches, shift);
> + upper_half = svorv_u16(svptrue_b16(), sv_matches)
> + << RTE_HASH_BUCKET_ENTRIES;
> + }
> + hitmask_buffer[i / 8] = upper_half | lower_half;
> + i += vl;
> + } while (i < RTE_HASH_BUCKET_ENTRIES);
> + }
> + }
> + break;
> #endif
> default:
> for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index 0697743cdf..75f555ba2c 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
> h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
> else
> #elif defined(RTE_ARCH_ARM64)
> - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
> + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
> h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
> +#if defined(RTE_HAS_SVE_ACLE)
> + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
> + h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
> +#endif
> + }
> else
> #endif
> h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index a528f1d1a0..01ad01c258 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -139,6 +139,7 @@ enum rte_hash_sig_compare_function {
> RTE_HASH_COMPARE_SCALAR = 0,
> RTE_HASH_COMPARE_SSE,
> RTE_HASH_COMPARE_NEON,
> + RTE_HASH_COMPARE_SVE,
> RTE_HASH_COMPARE_NUM
> };
I am surprised the ABI check does not complain over this change.
RTE_HASH_COMPARE_NUM is not used and knowing the number of compare
function implementations should not be of interest for an application.
But it still seem an ABI breakage to me.
RTE_HASH_COMPARE_NUM can be removed in v24.11.
And ideally, sig_cmp_fn should be made opaque (or moved to an opaque
struct out of the rte_hash public struct).
--
David Marchand
^ permalink raw reply [relevance 4%]
* [PATCH v2 1/2] power: introduce PM QoS API on CPU wide
2024-06-13 11:20 4% ` [PATCH v2 0/2] power: " Huisong Li
@ 2024-06-13 11:20 5% ` Huisong Li
0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-06-13 11:20 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, liuyonglong, lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
doc/guides/prog_guide/power_man.rst | 22 +++++
doc/guides/rel_notes/release_24_07.rst | 4 +
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 116 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 70 +++++++++++++++
lib/power/version.map | 2 +
6 files changed, 216 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..3ff46f06c1 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,28 @@ Get Num Pkgs
Get Num Dies
Get the number of die's on a given package.
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can effect the work
+CPU's idle state selection and just allow to enter the shallowest idle state
+if set to zero (strict resume latency) for this CPU.
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can obtain the resume
+latency on specified CPU.
+
References
----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..7c0d36e389 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
* Added SSE/NEON vector datapath.
+* **Introduce PM QoS interface.**
+
+ * Introduce PM QoS interface to low the delay after sleep.
+
Removed Items
-------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
'rte_power.c',
'rte_power_uncore.c',
'rte_power_pmd_mgmt.c',
+ 'rte_power_qos.c',
)
headers = files(
'rte_power.h',
'rte_power_guest_channel.h',
'rte_power_pmd_mgmt.h',
'rte_power_uncore.h',
+ 'rte_power_qos.h',
)
if cc.has_argument('-Wno-cast-qual')
cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..706f8432ee
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US \
+ "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+ char buf[BUFSIZ] = {0};
+ FILE *f;
+ int ret;
+
+ if (lcore_id >= RTE_MAX_LCORE) {
+ POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+ lcore_id, RTE_MAX_LCORE - 1U);
+ return -EINVAL;
+ }
+
+ if (latency < 0) {
+ POWER_LOG(ERR, "latency should be greater than and equal to 0");
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different input string.
+ * 1> the resume latency is 0 if the input is "n/a".
+ * 2> the resume latency is no constraint if the input is "0".
+ * 3> the resume latency is the actual value to be set.
+ */
+ if (latency == 0)
+ sprintf(buf, "%s", "n/a");
+ else if (latency == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+ sprintf(buf, "%u", 0);
+ else
+ sprintf(buf, "%u", latency);
+
+ ret = write_core_sysfs_s(f, buf);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+ char buf[BUFSIZ];
+ int latency = -1;
+ FILE *f;
+ int ret;
+
+ if (lcore_id >= RTE_MAX_LCORE) {
+ POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+ lcore_id, RTE_MAX_LCORE - 1U);
+ return -EINVAL;
+ }
+
+ ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ return ret;
+ }
+
+ ret = read_core_sysfs_s(f, buf, sizeof(buf));
+ if (ret != 0) {
+ POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+ goto out;
+ }
+
+ /*
+ * Based on the sysfs interface pm_qos_resume_latency_us under
+ * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+ * is as follows for different output string.
+ * 1> the resume latency is 0 if the output is "n/a".
+ * 2> the resume latency is no constraint if the output is "0".
+ * 3> the resume latency is the actual value in used for other string.
+ */
+ if (strcmp(buf, "n/a") == 0)
+ latency = 0;
+ else {
+ latency = strtoul(buf, NULL, 10);
+ latency = latency == 0 ? PM_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+ }
+
+out:
+ if (f != NULL)
+ fclose(f);
+
+ return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..1ba9568d1b
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define PM_QOS_STRICT_LATENCY_VALUE 0
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ * target logical core id
+ *
+ * @param latency
+ * The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ * 0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if don't set it.
+ *
+ * @return
+ * Negative value on failure.
+ * >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
rte_power_set_uncore_env;
rte_power_uncore_freqs;
rte_power_unset_uncore_env;
+ rte_power_qos_set_cpu_resume_latency;
+ rte_power_qos_get_cpu_resume_latency;
};
--
2.22.0
^ permalink raw reply [relevance 5%]
* [PATCH v2 0/2] power: introduce PM QoS interface
@ 2024-06-13 11:20 4% ` Huisong Li
2024-06-13 11:20 5% ` [PATCH v2 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19 6:31 4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
` (5 subsequent siblings)
6 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-06-13 11:20 UTC (permalink / raw)
To: dev
Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
sivaprasad.tummala, liuyonglong, lihuisong
The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.
And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.
The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).
[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
Huisong Li (2):
power: introduce PM QoS API on CPU wide
examples/l3fwd-power: add PM QoS configuration
doc/guides/prog_guide/power_man.rst | 22 +++++
doc/guides/rel_notes/release_24_07.rst | 4 +
examples/l3fwd-power/main.c | 29 +++++++
lib/power/meson.build | 2 +
lib/power/rte_power_qos.c | 116 +++++++++++++++++++++++++
lib/power/rte_power_qos.h | 70 +++++++++++++++
lib/power/version.map | 2 +
7 files changed, 245 insertions(+)
create mode 100644 lib/power/rte_power_qos.c
create mode 100644 lib/power/rte_power_qos.h
--
2.22.0
^ permalink raw reply [relevance 4%]
* RE: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
2024-06-11 14:52 3% ` Ferruh Yigit
@ 2024-06-12 1:25 0% ` rongwei liu
2024-06-25 14:46 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: rongwei liu @ 2024-06-12 1:25 UTC (permalink / raw)
To: Ferruh Yigit, dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
Suanming Mou, NBU-Contact-Thomas Monjalon (EXTERNAL),
Andrew Rybchenko
Cc: Dariusz Sosnowski, Aman Singh, Yuying Zhang
BR
Rongwei
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, June 11, 2024 22:53
> To: rongwei liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
>
> External email: Use caution opening links or attachments
>
>
> On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> > diff --git a/lib/net/rte_vxlan.h b/lib/net/rte_vxlan.h index
> > 997fc784fc..57300fb442 100644
> > --- a/lib/net/rte_vxlan.h
> > +++ b/lib/net/rte_vxlan.h
> > @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
> > uint8_t flags; /**< Should be 8 (I flag). */
> > uint8_t rsvd0[3]; /**< Reserved. */
> > uint8_t vni[3]; /**< VXLAN identifier. */
> > - uint8_t rsvd1; /**< Reserved. */
> > + union {
> > + uint8_t rsvd1; /**< Reserved. */
> > + uint8_t last_rsvd; /**< Reserved. */
> > + };
> >
>
> Is there a plan to remove 'rsvd1' in next ABI break release?
> We can keep both, but I guess it is not logically necessary to keep it, to prevent
> bloat by time, we can remove the old one.
> If decided to remove, sending a 'deprecation.rst' update helps us to remember
> doing it.
>
I think it should. @NBU-Contact-Thomas Monjalon (EXTERNAL) @Andrew Rybchenko@Ori Kam what do you think?
^ permalink raw reply [relevance 0%]
* Re: [RFC 0/2] ethdev: update GENEVE option item structure
@ 2024-06-11 17:07 4% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-06-11 17:07 UTC (permalink / raw)
To: Michael Baum, dev; +Cc: Dariusz Sosnowski, Thomas Monjalon, Ori Kam
On 4/17/2024 8:23 AM, Michael Baum wrote:
> The "rte_flow_item_geneve_opt" structure describes the GENEVE TLV option
> header according to RFC 8926 [1]:
>
> struct rte_flow_item_geneve_opt {
> rte_be16_t option_class;
> uint8_t option_type;
> uint8_t option_len;
> uint32_t *data;
> };
>
> The "option_len" field is used for two different purposes:
> 1. item field for matching with value/mask.
> 2. descriptor for data array size.
>
For the long run solution, we may consider adding geneve option header
to net/rte_geneve.h and make "struct rte_flow_item_geneve_opt" + data size ?
> Those two different purposes might limit each other. For example, when
> matching on length with full mask (0x1f), the data array in the mask
> structure might be taken as size 31 and read invalid memory.
>
> This problem appears in conversion API. In current implementation, the
> "rte_flow_conv" API copies the "rte_flow_item_geneve_opt" structure
> without taking care about data deep-copy. The attempt to solve this
> revealed the problem in determining the size of the mask data array. To
> resolve this issue, two solutions are suggested.
>
Are we having this problem only with geneve options because data size is
not fixed / defined for the header?
> Immediate Workaround:
> The data array size in the "mask" structure is determined by
> "option_len" field in the "spec" structure. This workaround can be
> integrated soon to avoid deep-copy missing.
>
This requires a geneve specific pointer in the item spec, which is not
really nice, although it is temporary solution. Perhaps we can skip
this, can you please check below comment.
> Long Run Solution:
> Add a new field into "rte_flow_item_geneve_opt" structure regardless to
> "option_len" field. This solution should wait to "24.11" version since
> it contains API change.
>
I was expecting the same, but CI seems passed ABI test case [1], it may
be because new field appended end of the struct.
Can you please double check, if ABI is not broken, we can go with this
solution directly?
[1]
https://mails.dpdk.org/archives/test-report/2024-April/643570.html
> When the API is changed, I'll take the opportunity to add documentation
> for this item in "rte_flow.rst" file and update the data type to
> "rte_be32_t".
>
If we can go with updating struct in this release, adding protocol
option struct in net library can wait v24.11 release.
So "rte_be32_t" type change in this struct won't be a thing.
> [1] https://datatracker.ietf.org/doc/html/rfc8926
>
> Michael Baum (2):
> ethdev: fix GENEVE option item conversion
> ethdev: add data size field to GENEVE option item
>
>
@Ori, Can you please help reviewing this patch?
At worst, it can be good to address the fix in this release.
^ permalink raw reply [relevance 4%]
* Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
@ 2024-06-11 14:52 3% ` Ferruh Yigit
2024-06-12 1:25 0% ` rongwei liu
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-06-11 14:52 UTC (permalink / raw)
To: Rongwei Liu, dev, matan, viacheslavo, orika, suanmingm, thomas
Cc: Dariusz Sosnowski, Aman Singh, Yuying Zhang, Andrew Rybchenko
On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> diff --git a/lib/net/rte_vxlan.h b/lib/net/rte_vxlan.h
> index 997fc784fc..57300fb442 100644
> --- a/lib/net/rte_vxlan.h
> +++ b/lib/net/rte_vxlan.h
> @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
> uint8_t flags; /**< Should be 8 (I flag). */
> uint8_t rsvd0[3]; /**< Reserved. */
> uint8_t vni[3]; /**< VXLAN identifier. */
> - uint8_t rsvd1; /**< Reserved. */
> + union {
> + uint8_t rsvd1; /**< Reserved. */
> + uint8_t last_rsvd; /**< Reserved. */
> + };
>
Is there a plan to remove 'rsvd1' in next ABI break release?
We can keep both, but I guess it is not logically necessary to keep it,
to prevent bloat by time, we can remove the old one.
If decided to remove, sending a 'deprecation.rst' update helps us to
remember doing it.
^ permalink raw reply [relevance 3%]
* Re: [PATCH v5 1/2] eventdev/dma: reorganize event DMA ops
2024-06-07 10:36 3% ` [PATCH v5 " pbhagavatula
@ 2024-06-08 6:16 9% ` Jerin Jacob
0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2024-06-08 6:16 UTC (permalink / raw)
To: pbhagavatula; +Cc: jerinj, Amit Prakash Shukla, Vamsi Attunuru, dev
On Fri, Jun 7, 2024 at 11:53 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Re-organize event DMA ops structure to allow holding
> source and destination pointers without the need for
> additional memory, the mempool allocating memory for
> rte_event_dma_adapter_ops can size the structure to
> accommodate all the needed source and destination
> pointers.
>
> Add multiple words for holding user metadata, adapter
> implementation specific metadata and event metadata.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> ---
> v5 Changes:
> - Update release notes with Experimental API changes.
> v4 Changes:
> - Reduce unreleated driver changes and move to 2/2.
> v3 Changes:
> - Fix stdatomic compilation.
> v2 Changes:
> - Fix 32bit compilation
>
> .
> diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
> index a69f24cf99..7800cb4c31 100644
> --- a/doc/guides/rel_notes/release_24_07.rst
> +++ b/doc/guides/rel_notes/release_24_07.rst
> @@ -84,6 +84,9 @@ API Changes
It is not API change. Applied following diff and Applied series to
dpdk-next-eventdev/for-main. Thanks
[for-main][dpdk-next-eventdev] $ git diff
diff --git a/doc/guides/rel_notes/release_24_07.rst
b/doc/guides/rel_notes/release_24_07.rst
index 09e58dddf2..14bd5d37b1 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -91,9 +91,6 @@ API Changes
Also, make sure to start the actual text at the margin.
=======================================================
-* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
- to optimize the memory layout and improve performance.
-
ABI Changes
-----------
@@ -112,6 +109,9 @@ ABI Changes
* No ABI change that would break compatibility with 23.11.
+* eventdev/dma: Reorganize the experimental fastpath structure
``rte_event_dma_adapter_op``
+ to optimize the memory layout and improve performance.
+
> Also, make sure to start the actual text at the margin.
> =======================================================
>
> +* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
> + to optimize the memory layout and improve performance.
> +
>
> ABI Changes
^ permalink raw reply [relevance 9%]
* [PATCH v5 1/2] eventdev/dma: reorganize event DMA ops
@ 2024-06-07 10:36 3% ` pbhagavatula
2024-06-08 6:16 9% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: pbhagavatula @ 2024-06-07 10:36 UTC (permalink / raw)
To: jerinj, Amit Prakash Shukla, Vamsi Attunuru; +Cc: dev, Pavan Nikhilesh
From: Pavan Nikhilesh <pbhagavatula@marvell.com>
Re-organize event DMA ops structure to allow holding
source and destination pointers without the need for
additional memory, the mempool allocating memory for
rte_event_dma_adapter_ops can size the structure to
accommodate all the needed source and destination
pointers.
Add multiple words for holding user metadata, adapter
implementation specific metadata and event metadata.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v5 Changes:
- Update release notes with Experimental API changes.
v4 Changes:
- Reduce unreleated driver changes and move to 2/2.
v3 Changes:
- Fix stdatomic compilation.
v2 Changes:
- Fix 32bit compilation
app/test-eventdev/test_perf_common.c | 26 ++++--------
app/test/test_event_dma_adapter.c | 20 +++------
doc/guides/prog_guide/event_dma_adapter.rst | 2 +-
doc/guides/rel_notes/release_24_07.rst | 3 ++
drivers/dma/cnxk/cnxk_dmadev_fp.c | 20 ++++-----
lib/eventdev/rte_event_dma_adapter.c | 27 ++++--------
lib/eventdev/rte_event_dma_adapter.h | 46 +++++++++++++++------
7 files changed, 69 insertions(+), 75 deletions(-)
diff --git a/app/test-eventdev/test_perf_common.c b/app/test-eventdev/test_perf_common.c
index 93e6132de8..db0f9c1f3b 100644
--- a/app/test-eventdev/test_perf_common.c
+++ b/app/test-eventdev/test_perf_common.c
@@ -1503,7 +1503,6 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
prod = 0;
for (; port < perf_nb_event_ports(opt); port++) {
struct prod_data *p = &t->prod[port];
- struct rte_event *response_info;
uint32_t flow_id;
p->dev_id = opt->dev_id;
@@ -1523,13 +1522,10 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
for (flow_id = 0; flow_id < t->nb_flows; flow_id++) {
rte_mempool_get(t->da_op_pool, (void **)&op);
- op->src_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
- op->dst_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-
- op->src_seg->addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
- op->dst_seg->addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
- op->src_seg->length = 1024;
- op->dst_seg->length = 1024;
+ op->src_dst_seg[0].addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
+ op->src_dst_seg[1].addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
+ op->src_dst_seg[0].length = 1024;
+ op->src_dst_seg[1].length = 1024;
op->nb_src = 1;
op->nb_dst = 1;
op->flags = RTE_DMA_OP_FLAG_SUBMIT;
@@ -1537,12 +1533,6 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
op->dma_dev_id = dma_dev_id;
op->vchan = vchan_id;
- response_info = (struct rte_event *)((uint8_t *)op +
- sizeof(struct rte_event_dma_adapter_op));
- response_info->queue_id = p->queue_id;
- response_info->sched_type = RTE_SCHED_TYPE_ATOMIC;
- response_info->flow_id = flow_id;
-
p->da.dma_op[flow_id] = op;
}
@@ -2036,7 +2026,7 @@ perf_dmadev_setup(struct evt_test *test, struct evt_options *opt)
return -ENODEV;
}
- elt_size = sizeof(struct rte_event_dma_adapter_op) + sizeof(struct rte_event);
+ elt_size = sizeof(struct rte_event_dma_adapter_op) + (sizeof(struct rte_dma_sge) * 2);
t->da_op_pool = rte_mempool_create("dma_op_pool", opt->pool_sz, elt_size, 256,
0, NULL, NULL, NULL, NULL, rte_socket_id(), 0);
if (t->da_op_pool == NULL) {
@@ -2085,10 +2075,8 @@ perf_dmadev_destroy(struct evt_test *test, struct evt_options *opt)
for (flow_id = 0; flow_id < t->nb_flows; flow_id++) {
op = p->da.dma_op[flow_id];
- rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_seg->addr);
- rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->dst_seg->addr);
- rte_free(op->src_seg);
- rte_free(op->dst_seg);
+ rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_dst_seg[0].addr);
+ rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_dst_seg[1].addr);
rte_mempool_put(op->op_mp, op);
}
diff --git a/app/test/test_event_dma_adapter.c b/app/test/test_event_dma_adapter.c
index 35b417b69f..d9dff4ff7d 100644
--- a/app/test/test_event_dma_adapter.c
+++ b/app/test/test_event_dma_adapter.c
@@ -235,7 +235,6 @@ test_op_forward_mode(void)
struct rte_mbuf *dst_mbuf[TEST_MAX_OP];
struct rte_event_dma_adapter_op *op;
struct rte_event ev[TEST_MAX_OP];
- struct rte_event response_info;
int ret, i;
ret = rte_pktmbuf_alloc_bulk(params.src_mbuf_pool, src_mbuf, TEST_MAX_OP);
@@ -253,14 +252,11 @@ test_op_forward_mode(void)
rte_mempool_get(params.op_mpool, (void **)&op);
TEST_ASSERT_NOT_NULL(op, "Failed to allocate dma operation struct\n");
- op->src_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
- op->dst_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-
/* Update Op */
- op->src_seg->addr = rte_pktmbuf_iova(src_mbuf[i]);
- op->dst_seg->addr = rte_pktmbuf_iova(dst_mbuf[i]);
- op->src_seg->length = PACKET_LENGTH;
- op->dst_seg->length = PACKET_LENGTH;
+ op->src_dst_seg[0].addr = rte_pktmbuf_iova(src_mbuf[i]);
+ op->src_dst_seg[1].addr = rte_pktmbuf_iova(dst_mbuf[i]);
+ op->src_dst_seg[0].length = PACKET_LENGTH;
+ op->src_dst_seg[1].length = PACKET_LENGTH;
op->nb_src = 1;
op->nb_dst = 1;
op->flags = RTE_DMA_OP_FLAG_SUBMIT;
@@ -268,10 +264,6 @@ test_op_forward_mode(void)
op->dma_dev_id = TEST_DMA_DEV_ID;
op->vchan = TEST_DMA_VCHAN_ID;
- response_info.event = dma_response_info.event;
- rte_memcpy((uint8_t *)op + sizeof(struct rte_event_dma_adapter_op), &response_info,
- sizeof(struct rte_event));
-
/* Fill in event info and update event_ptr with rte_event_dma_adapter_op */
memset(&ev[i], 0, sizeof(struct rte_event));
ev[i].event = 0;
@@ -294,8 +286,6 @@ test_op_forward_mode(void)
TEST_ASSERT_EQUAL(ret, 0, "Data mismatch for dma adapter\n");
- rte_free(op->src_seg);
- rte_free(op->dst_seg);
rte_mempool_put(op->op_mp, op);
}
@@ -400,7 +390,7 @@ configure_dmadev(void)
rte_socket_id());
RTE_TEST_ASSERT_NOT_NULL(params.dst_mbuf_pool, "Can't create DMA_DST_MBUFPOOL\n");
- elt_size = sizeof(struct rte_event_dma_adapter_op) + sizeof(struct rte_event);
+ elt_size = sizeof(struct rte_event_dma_adapter_op) + (sizeof(struct rte_dma_sge) * 2);
params.op_mpool = rte_mempool_create("EVENT_DMA_OP_POOL", DMA_OP_POOL_SIZE, elt_size, 0,
0, NULL, NULL, NULL, NULL, rte_socket_id(), 0);
RTE_TEST_ASSERT_NOT_NULL(params.op_mpool, "Can't create DMA_OP_POOL\n");
diff --git a/doc/guides/prog_guide/event_dma_adapter.rst b/doc/guides/prog_guide/event_dma_adapter.rst
index 3443b6a803..1fb9b0a07b 100644
--- a/doc/guides/prog_guide/event_dma_adapter.rst
+++ b/doc/guides/prog_guide/event_dma_adapter.rst
@@ -144,7 +144,7 @@ on which it enqueues events towards the DMA adapter using ``rte_event_enqueue_bu
uint32_t cap;
int ret;
- /* Fill in event info and update event_ptr with rte_dma_op */
+ /* Fill in event info and update event_ptr with rte_event_dma_adapter_op */
memset(&ev, 0, sizeof(ev));
.
.
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..7800cb4c31 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -84,6 +84,9 @@ API Changes
Also, make sure to start the actual text at the margin.
=======================================================
+* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
+ to optimize the memory layout and improve performance.
+
ABI Changes
-----------
diff --git a/drivers/dma/cnxk/cnxk_dmadev_fp.c b/drivers/dma/cnxk/cnxk_dmadev_fp.c
index f6562b603e..8a3c0c1008 100644
--- a/drivers/dma/cnxk/cnxk_dmadev_fp.c
+++ b/drivers/dma/cnxk/cnxk_dmadev_fp.c
@@ -490,8 +490,8 @@ cn10k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events)
hdr[1] = ((uint64_t)comp_ptr);
hdr[2] = cnxk_dma_adapter_format_event(rsp_info->event);
- src = &op->src_seg[0];
- dst = &op->dst_seg[0];
+ src = &op->src_dst_seg[0];
+ dst = &op->src_dst_seg[op->nb_src];
if (CNXK_TAG_IS_HEAD(work->gw_rdata) ||
((CNXK_TT_FROM_TAG(work->gw_rdata) == SSO_TT_ORDERED) &&
@@ -566,12 +566,12 @@ cn9k_dma_adapter_dual_enqueue(void *ws, struct rte_event ev[], uint16_t nb_event
* For all other cases, src pointers are first pointers.
*/
if (((dpi_conf->cmd.u >> 48) & DPI_HDR_XTYPE_MASK) == DPI_XTYPE_INBOUND) {
- fptr = &op->dst_seg[0];
- lptr = &op->src_seg[0];
+ fptr = &op->src_dst_seg[nb_src];
+ lptr = &op->src_dst_seg[0];
RTE_SWAP(nb_src, nb_dst);
} else {
- fptr = &op->src_seg[0];
- lptr = &op->dst_seg[0];
+ fptr = &op->src_dst_seg[0];
+ lptr = &op->src_dst_seg[nb_src];
}
hdr[0] = ((uint64_t)nb_dst << 54) | (uint64_t)nb_src << 48;
@@ -647,12 +647,12 @@ cn9k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events)
* For all other cases, src pointers are first pointers.
*/
if (((dpi_conf->cmd.u >> 48) & DPI_HDR_XTYPE_MASK) == DPI_XTYPE_INBOUND) {
- fptr = &op->dst_seg[0];
- lptr = &op->src_seg[0];
+ fptr = &op->src_dst_seg[nb_src];
+ lptr = &op->src_dst_seg[0];
RTE_SWAP(nb_src, nb_dst);
} else {
- fptr = &op->src_seg[0];
- lptr = &op->dst_seg[0];
+ fptr = &op->src_dst_seg[0];
+ lptr = &op->src_dst_seg[nb_src];
}
hdr[0] = ((uint64_t)nb_dst << 54) | (uint64_t)nb_src << 48;
diff --git a/lib/eventdev/rte_event_dma_adapter.c b/lib/eventdev/rte_event_dma_adapter.c
index 24dff556db..e52ef46a1b 100644
--- a/lib/eventdev/rte_event_dma_adapter.c
+++ b/lib/eventdev/rte_event_dma_adapter.c
@@ -236,9 +236,9 @@ edma_circular_buffer_flush_to_dma_dev(struct event_dma_adapter *adapter,
uint16_t vchan, uint16_t *nb_ops_flushed)
{
struct rte_event_dma_adapter_op *op;
- struct dma_vchan_info *tq;
uint16_t *head = &bufp->head;
uint16_t *tail = &bufp->tail;
+ struct dma_vchan_info *tq;
uint16_t n;
uint16_t i;
int ret;
@@ -257,11 +257,13 @@ edma_circular_buffer_flush_to_dma_dev(struct event_dma_adapter *adapter,
for (i = 0; i < n; i++) {
op = bufp->op_buffer[*head];
if (op->nb_src == 1 && op->nb_dst == 1)
- ret = rte_dma_copy(dma_dev_id, vchan, op->src_seg->addr, op->dst_seg->addr,
- op->src_seg->length, op->flags);
+ ret = rte_dma_copy(dma_dev_id, vchan, op->src_dst_seg[0].addr,
+ op->src_dst_seg[1].addr, op->src_dst_seg[0].length,
+ op->flags);
else
- ret = rte_dma_copy_sg(dma_dev_id, vchan, op->src_seg, op->dst_seg,
- op->nb_src, op->nb_dst, op->flags);
+ ret = rte_dma_copy_sg(dma_dev_id, vchan, &op->src_dst_seg[0],
+ &op->src_dst_seg[op->nb_src], op->nb_src, op->nb_dst,
+ op->flags);
if (ret < 0)
break;
@@ -511,8 +513,7 @@ edma_enq_to_dma_dev(struct event_dma_adapter *adapter, struct rte_event *ev, uns
if (dma_op == NULL)
continue;
- /* Expected to have response info appended to dma_op. */
-
+ dma_op->impl_opaque[0] = ev[i].event;
dma_dev_id = dma_op->dma_dev_id;
vchan = dma_op->vchan;
vchan_qinfo = &adapter->dma_devs[dma_dev_id].vchanq[vchan];
@@ -647,7 +648,6 @@ edma_ops_enqueue_burst(struct event_dma_adapter *adapter, struct rte_event_dma_a
uint8_t event_port_id = adapter->event_port_id;
uint8_t event_dev_id = adapter->eventdev_id;
struct rte_event events[DMA_BATCH_SIZE];
- struct rte_event *response_info;
uint16_t nb_enqueued, nb_ev;
uint8_t retry;
uint8_t i;
@@ -659,16 +659,7 @@ edma_ops_enqueue_burst(struct event_dma_adapter *adapter, struct rte_event_dma_a
for (i = 0; i < num; i++) {
struct rte_event *ev = &events[nb_ev++];
- /* Expected to have response info appended to dma_op. */
- response_info = (struct rte_event *)((uint8_t *)ops[i] +
- sizeof(struct rte_event_dma_adapter_op));
- if (unlikely(response_info == NULL)) {
- if (ops[i] != NULL && ops[i]->op_mp != NULL)
- rte_mempool_put(ops[i]->op_mp, ops[i]);
- continue;
- }
-
- rte_memcpy(ev, response_info, sizeof(struct rte_event));
+ ev->event = ops[i]->impl_opaque[0];
ev->event_ptr = ops[i];
ev->event_type = RTE_EVENT_TYPE_DMADEV;
if (adapter->implicit_release_disabled)
diff --git a/lib/eventdev/rte_event_dma_adapter.h b/lib/eventdev/rte_event_dma_adapter.h
index e924ab673d..048ddba3f3 100644
--- a/lib/eventdev/rte_event_dma_adapter.h
+++ b/lib/eventdev/rte_event_dma_adapter.h
@@ -157,24 +157,46 @@ extern "C" {
* instance.
*/
struct rte_event_dma_adapter_op {
- struct rte_dma_sge *src_seg;
- /**< Source segments. */
- struct rte_dma_sge *dst_seg;
- /**< Destination segments. */
- uint16_t nb_src;
- /**< Number of source segments. */
- uint16_t nb_dst;
- /**< Number of destination segments. */
uint64_t flags;
/**< Flags related to the operation.
* @see RTE_DMA_OP_FLAG_*
*/
- int16_t dma_dev_id;
- /**< DMA device ID to be used */
- uint16_t vchan;
- /**< DMA vchan ID to be used */
struct rte_mempool *op_mp;
/**< Mempool from which op is allocated. */
+ enum rte_dma_status_code status;
+ /**< Status code for this operation. */
+ uint32_t rsvd;
+ /**< Reserved for future use. */
+ uint64_t impl_opaque[2];
+ /**< Implementation-specific opaque data.
+ * An dma device implementation use this field to hold
+ * implementation specific values to share between dequeue and enqueue
+ * operations.
+ * The application should not modify this field.
+ */
+ uint64_t user_meta;
+ /**< Memory to store user specific metadata.
+ * The dma device implementation should not modify this area.
+ */
+ uint64_t event_meta;
+ /**< Event metadata that defines event attributes when used in OP_NEW mode.
+ * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_NEW
+ * @see struct rte_event::event
+ */
+ int16_t dma_dev_id;
+ /**< DMA device ID to be used with OP_FORWARD mode.
+ * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_FORWARD
+ */
+ uint16_t vchan;
+ /**< DMA vchan ID to be used with OP_FORWARD mode
+ * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_FORWARD
+ */
+ uint16_t nb_src;
+ /**< Number of source segments. */
+ uint16_t nb_dst;
+ /**< Number of destination segments. */
+ struct rte_dma_sge src_dst_seg[0];
+ /**< Source and destination segments. */
};
/**
--
2.25.1
^ permalink raw reply [relevance 3%]
* [PATCH 1/1] net/ena: restructure the llq policy user setting
@ 2024-06-06 13:33 3% ` shaibran
2024-07-05 17:32 4% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: shaibran @ 2024-06-06 13:33 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, Shai Brandes
From: Shai Brandes <shaibran@amazon.com>
Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
devargs with a new shared devarg named `llq_policy` that
implements the same logic and accepts the following values:
0 - Disable LLQ.
Use with extreme caution as it leads to a huge performance
degradation on AWS instances from 6th generation onwards.
1 - Accept device recommended LLQ policy (Default).
Device can recommend normal or large LLQ policy.
2 - Enforce normal LLQ policy.
3 - Enforce large LLQ policy.
Required for packets with header that exceed 96 bytes on
AWS instances prior to 5th generation.
Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
---
doc/guides/nics/ena.rst | 25 ++----
doc/guides/rel_notes/release_24_07.rst | 8 ++
drivers/net/ena/ena_ethdev.c | 104 +++++++++----------------
drivers/net/ena/ena_ethdev.h | 3 -
4 files changed, 51 insertions(+), 89 deletions(-)
diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 2b105834a0..8f693ac3c9 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -107,15 +107,15 @@ Configuration
Runtime Configuration
^^^^^^^^^^^^^^^^^^^^^
- * **large_llq_hdr** (default 0)
+ * **llq_policy** (default 1)
- Enables or disables usage of large LLQ headers. This option will have
- effect only if the device also supports large LLQ headers. Otherwise, the
- default value will be used.
-
- * **normal_llq_hdr** (default 0)
-
- Enforce normal LLQ policy.
+ Controls whether use device recommended header policy or override it.
+ 0 - Disable LLQ.
+ **Use with extreme caution as it leads to a huge performance
+ degradation on AWS instances from 6th generation onwards.**
+ 1 - Accept device recommended LLQ policy (Default).
+ 2 - Enforce normal LLQ policy.
+ 3 - Enforce large LLQ policy.
* **miss_txc_to** (default 5)
@@ -126,15 +126,6 @@ Runtime Configuration
timer service. Setting this parameter to 0 disables this feature. Maximum
allowed value is 60 seconds.
- * **enable_llq** (default 1)
-
- Determines whenever the driver should use the LLQ (if it's available) or
- not.
-
- **NOTE: On the 6th generation AWS instances disabling LLQ may lead to a
- huge performance degradation. In general disabling LLQ is highly not
- recommended!**
-
* **control_poll_interval** (default 0)
Enable polling-based functionality of the admin queues,
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..1fa678864d 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -73,6 +73,12 @@ New Features
``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
container.
+* **Updated Amazon ena (Elastic Network Adapter) net driver.**
+
+ * Modified the PMD API that controls the LLQ header policy.
+ Replaced ``enable_llq``, ``normal_llq_hdr`` and ``large_llq_hdr`` devargs
+ with a new shared devarg ``llq_policy`` that keeps the same logic.
+
* **Update Tap PMD driver.**
* Updated to support up to 8 queues when used by secondary process.
@@ -117,6 +123,8 @@ API Changes
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
=======================================================
+* drivers/net/ena: Removed ``enable_llq``, ``normal_llq_hdr`` and ``large_llq_hdr`` devargs
+ and replaced it with a new shared devarg ``llq_policy`` that keeps the same logic.
ABI Changes
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 66fc287faf..e3c2696ae1 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -81,18 +81,27 @@ struct ena_stats {
ENA_STAT_ENTRY(stat, srd)
/* Device arguments */
-#define ENA_DEVARG_LARGE_LLQ_HDR "large_llq_hdr"
-#define ENA_DEVARG_NORMAL_LLQ_HDR "normal_llq_hdr"
+
+/* Controls whether to disable LLQ, use device recommended header policy
+ * or overriding the device recommendation.
+ * 0 - Disable LLQ.
+ * Use with extreme caution as it leads to a huge performance
+ * degradation on AWS instances from 6th generation onwards.
+ * 1 - Accept device recommended LLQ policy (Default).
+ * Device can recommend normal or large LLQ policy.
+ * 2 - Enforce normal LLQ policy.
+ * 3 - Enforce large LLQ policy.
+ * Required for packets with header that exceed 96 bytes on
+ * AWS instances prior to 5th generation.
+ */
+ #define ENA_DEVARG_LLQ_POLICY "llq_policy"
+
+
/* Timeout in seconds after which a single uncompleted Tx packet should be
* considered as a missing.
*/
#define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
-/*
- * Controls whether LLQ should be used (if available). Enabled by default.
- * NOTE: It's highly not recommended to disable the LLQ, as it may lead to a
- * huge performance degradation on 6th generation AWS instances.
- */
-#define ENA_DEVARG_ENABLE_LLQ "enable_llq"
+
/*
* Controls the period of time (in milliseconds) between two consecutive inspections of
* the control queues when the driver is in poll mode and not using interrupts.
@@ -296,9 +305,9 @@ static int ena_xstats_get_by_id(struct rte_eth_dev *dev,
const uint64_t *ids,
uint64_t *values,
unsigned int n);
-static int ena_process_bool_devarg(const char *key,
- const char *value,
- void *opaque);
+static int ena_process_llq_policy_devarg(const char *key,
+ const char *value,
+ void *opaque);
static int ena_parse_devargs(struct ena_adapter *adapter,
struct rte_devargs *devargs);
static void ena_copy_customer_metrics(struct ena_adapter *adapter,
@@ -314,7 +323,6 @@ static int ena_rx_queue_intr_disable(struct rte_eth_dev *dev,
static int ena_configure_aenq(struct ena_adapter *adapter);
static int ena_mp_primary_handle(const struct rte_mp_msg *mp_msg,
const void *peer);
-static ena_llq_policy ena_define_llq_hdr_policy(struct ena_adapter *adapter);
static bool ena_use_large_llq_hdr(struct ena_adapter *adapter, uint8_t recommended_entry_size);
static const struct eth_dev_ops ena_dev_ops = {
@@ -2292,9 +2300,6 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
/* Assign default devargs values */
adapter->missing_tx_completion_to = ENA_TX_TIMEOUT;
- adapter->enable_llq = true;
- adapter->use_large_llq_hdr = false;
- adapter->use_normal_llq_hdr = false;
/* Get user bypass */
rc = ena_parse_devargs(adapter, pci_dev->device.devargs);
@@ -2302,7 +2307,6 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
PMD_INIT_LOG(CRIT, "Failed to parse devargs\n");
goto err;
}
- adapter->llq_header_policy = ena_define_llq_hdr_policy(adapter);
rc = ena_com_allocate_customer_metrics_buffer(ena_dev);
if (rc != 0) {
@@ -3736,44 +3740,29 @@ static int ena_process_uint_devarg(const char *key,
return 0;
}
-static int ena_process_bool_devarg(const char *key,
- const char *value,
- void *opaque)
+static int ena_process_llq_policy_devarg(const char *key, const char *value, void *opaque)
{
struct ena_adapter *adapter = opaque;
- bool bool_value;
+ uint32_t policy;
- /* Parse the value. */
- if (strcmp(value, "1") == 0) {
- bool_value = true;
- } else if (strcmp(value, "0") == 0) {
- bool_value = false;
+ policy = strtoul(value, NULL, DECIMAL_BASE);
+ if (policy < ENA_LLQ_POLICY_LAST) {
+ adapter->llq_header_policy = policy;
} else {
- PMD_INIT_LOG(ERR,
- "Invalid value: '%s' for key '%s'. Accepted: '0' or '1'\n",
- value, key);
+ PMD_INIT_LOG(ERR, "Invalid value: '%s' for key '%s'. valid [0-3]\n", value, key);
return -EINVAL;
}
-
- /* Now, assign it to the proper adapter field. */
- if (strcmp(key, ENA_DEVARG_LARGE_LLQ_HDR) == 0)
- adapter->use_large_llq_hdr = bool_value;
- else if (strcmp(key, ENA_DEVARG_NORMAL_LLQ_HDR) == 0)
- adapter->use_normal_llq_hdr = bool_value;
- else if (strcmp(key, ENA_DEVARG_ENABLE_LLQ) == 0)
- adapter->enable_llq = bool_value;
-
+ PMD_DRV_LOG(INFO,
+ "LLQ policy is %u [0 - disabled, 1 - device recommended, 2 - normal, 3 - large]\n",
+ adapter->llq_header_policy);
return 0;
}
-static int ena_parse_devargs(struct ena_adapter *adapter,
- struct rte_devargs *devargs)
+static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *devargs)
{
static const char * const allowed_args[] = {
- ENA_DEVARG_LARGE_LLQ_HDR,
- ENA_DEVARG_NORMAL_LLQ_HDR,
+ ENA_DEVARG_LLQ_POLICY,
ENA_DEVARG_MISS_TXC_TO,
- ENA_DEVARG_ENABLE_LLQ,
ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
NULL,
};
@@ -3785,27 +3774,17 @@ static int ena_parse_devargs(struct ena_adapter *adapter,
kvlist = rte_kvargs_parse(devargs->args, allowed_args);
if (kvlist == NULL) {
- PMD_INIT_LOG(ERR, "Invalid device arguments: %s\n",
- devargs->args);
+ PMD_INIT_LOG(ERR, "Invalid device arguments: %s\n", devargs->args);
return -EINVAL;
}
-
- rc = rte_kvargs_process(kvlist, ENA_DEVARG_LARGE_LLQ_HDR,
- ena_process_bool_devarg, adapter);
- if (rc != 0)
- goto exit;
- rc = rte_kvargs_process(kvlist, ENA_DEVARG_NORMAL_LLQ_HDR,
- ena_process_bool_devarg, adapter);
+ rc = rte_kvargs_process(kvlist, ENA_DEVARG_LLQ_POLICY,
+ ena_process_llq_policy_devarg, adapter);
if (rc != 0)
goto exit;
rc = rte_kvargs_process(kvlist, ENA_DEVARG_MISS_TXC_TO,
ena_process_uint_devarg, adapter);
if (rc != 0)
goto exit;
- rc = rte_kvargs_process(kvlist, ENA_DEVARG_ENABLE_LLQ,
- ena_process_bool_devarg, adapter);
- if (rc != 0)
- goto exit;
rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
ena_process_uint_devarg, adapter);
if (rc != 0)
@@ -4029,9 +4008,7 @@ RTE_PMD_REGISTER_PCI(net_ena, rte_ena_pmd);
RTE_PMD_REGISTER_PCI_TABLE(net_ena, pci_id_ena_map);
RTE_PMD_REGISTER_KMOD_DEP(net_ena, "* igb_uio | uio_pci_generic | vfio-pci");
RTE_PMD_REGISTER_PARAM_STRING(net_ena,
- ENA_DEVARG_LARGE_LLQ_HDR "=<0|1> "
- ENA_DEVARG_NORMAL_LLQ_HDR "=<0|1> "
- ENA_DEVARG_ENABLE_LLQ "=<0|1> "
+ ENA_DEVARG_LLQ_POLICY "=<0|1|2|3> "
ENA_DEVARG_MISS_TXC_TO "=<uint>"
ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>");
RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE);
@@ -4219,17 +4196,6 @@ ena_mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
return rte_mp_reply(&mp_rsp, peer);
}
-static ena_llq_policy ena_define_llq_hdr_policy(struct ena_adapter *adapter)
-{
- if (!adapter->enable_llq)
- return ENA_LLQ_POLICY_DISABLED;
- if (adapter->use_large_llq_hdr)
- return ENA_LLQ_POLICY_LARGE;
- if (adapter->use_normal_llq_hdr)
- return ENA_LLQ_POLICY_NORMAL;
- return ENA_LLQ_POLICY_RECOMMENDED;
-}
-
static bool ena_use_large_llq_hdr(struct ena_adapter *adapter, uint8_t recommended_entry_size)
{
if (adapter->llq_header_policy == ENA_LLQ_POLICY_LARGE) {
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 7d82d222ce..fe7d4a2d65 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -337,9 +337,6 @@ struct ena_adapter {
uint32_t active_aenq_groups;
bool trigger_reset;
- bool enable_llq;
- bool use_large_llq_hdr;
- bool use_normal_llq_hdr;
ena_llq_policy llq_header_policy;
uint32_t last_tx_comp_qid;
--
2.17.1
^ permalink raw reply [relevance 3%]
* Re: [PATCH 2/2] eal: add Arm WFET in power management intrinsics
@ 2024-06-04 15:41 3% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-06-04 15:41 UTC (permalink / raw)
To: Wathsala Vithanage
Cc: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang, dev, nd,
Dhruv Tripathi, Honnappa Nagarahalli, Jack Bond-Preston,
Nick Connolly, Vinod Krishna
On Tue, 4 Jun 2024 04:44:01 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> --- a/lib/eal/arm/include/rte_cpuflags_64.h
> +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> @@ -35,6 +35,7 @@ enum rte_cpu_flag_t {
> RTE_CPUFLAG_SVEF32MM,
> RTE_CPUFLAG_SVEF64MM,
> RTE_CPUFLAG_SVEBF16,
> + RTE_CPUFLAG_WFXT,
> RTE_CPUFLAG_AARCH64,
> };
Adding new entry in middle of enum will cause ABI to change.
^ permalink raw reply [relevance 3%]
* RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
2024-05-29 10:43 0% ` Anoob Joseph
@ 2024-05-30 9:19 0% ` Akhil Goyal
0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-05-30 9:19 UTC (permalink / raw)
To: Anoob Joseph, dev
Cc: thomas, david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, g.singh, fanzhang.oss,
jianjay.zhou, asomalap, ruifeng.wang, konstantin.v.ananyev,
radu.nicolau, ajit.khaparde, Nagadheeraj Rottela, ciara.power
> Subject: RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
>
> >
> > Added a new fast path API to get the number of used crypto device queue pair
> > depth at any given point.
> >
> > An implementation in cnxk crypto driver is also added along with a test case in
> > test app.
> >
> > The addition of new API causes an ABI warning.
> > This is suppressed as the updated struct rte_crypto_fp_ops is an internal
> > structure and not to be used by application directly.
> >
>
> Series Acked-by: Anoob Joseph <anoobj@marvell.com>
>
Applied to dpdk-next-crypto
^ permalink raw reply [relevance 0%]
* [PATCH v10 01/20] mbuf: replace term sanity check
@ 2024-05-29 23:33 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-29 23:33 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Andrew Rybchenko, Morten Brørup
Replace rte_mbuf_sanity_check() with rte_mbuf_verify()
to match the similar macro RTE_VERIFY() in rte_debug.h
The term sanity check is on the Tier 2 list of words
that should be replaced.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
app/test/test_mbuf.c | 28 +++++------
doc/guides/prog_guide/mbuf_lib.rst | 4 +-
doc/guides/rel_notes/deprecation.rst | 3 ++
drivers/net/avp/avp_ethdev.c | 18 +++----
drivers/net/sfc/sfc_ef100_rx.c | 6 +--
drivers/net/sfc/sfc_ef10_essb_rx.c | 4 +-
drivers/net/sfc/sfc_ef10_rx.c | 4 +-
drivers/net/sfc/sfc_rx.c | 2 +-
examples/ipv4_multicast/main.c | 2 +-
lib/mbuf/rte_mbuf.c | 23 +++++----
lib/mbuf/rte_mbuf.h | 71 +++++++++++++++-------------
lib/mbuf/version.map | 1 +
12 files changed, 90 insertions(+), 76 deletions(-)
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 17be977f31..3fbb5dea8b 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -262,8 +262,8 @@ test_one_pktmbuf(struct rte_mempool *pktmbuf_pool)
GOTO_FAIL("Buffer should be continuous");
memset(hdr, 0x55, MBUF_TEST_HDR2_LEN);
- rte_mbuf_sanity_check(m, 1);
- rte_mbuf_sanity_check(m, 0);
+ rte_mbuf_verify(m, 1);
+ rte_mbuf_verify(m, 0);
rte_pktmbuf_dump(stdout, m, 0);
/* this prepend should fail */
@@ -1162,7 +1162,7 @@ test_refcnt_mbuf(void)
#ifdef RTE_EXEC_ENV_WINDOWS
static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
{
RTE_SET_USED(pktmbuf_pool);
return TEST_SKIPPED;
@@ -1181,12 +1181,12 @@ mbuf_check_pass(struct rte_mbuf *buf)
}
static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
{
struct rte_mbuf *buf;
struct rte_mbuf badbuf;
- printf("Checking rte_mbuf_sanity_check for failure conditions\n");
+ printf("Checking rte_mbuf_verify for failure conditions\n");
/* get a good mbuf to use to make copies */
buf = rte_pktmbuf_alloc(pktmbuf_pool);
@@ -1708,7 +1708,7 @@ test_mbuf_validate_tx_offload(const char *test_name,
GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("%s: Bad packet length\n", __func__);
- rte_mbuf_sanity_check(m, 0);
+ rte_mbuf_verify(m, 0);
m->ol_flags = ol_flags;
m->tso_segsz = segsize;
ret = rte_validate_tx_offload(m);
@@ -1915,7 +1915,7 @@ test_pktmbuf_read(struct rte_mempool *pktmbuf_pool)
GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("%s: Bad packet length\n", __func__);
- rte_mbuf_sanity_check(m, 0);
+ rte_mbuf_verify(m, 0);
data = rte_pktmbuf_append(m, MBUF_TEST_DATA_LEN2);
if (data == NULL)
@@ -1964,7 +1964,7 @@ test_pktmbuf_read_from_offset(struct rte_mempool *pktmbuf_pool)
if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("%s: Bad packet length\n", __func__);
- rte_mbuf_sanity_check(m, 0);
+ rte_mbuf_verify(m, 0);
/* prepend an ethernet header */
hdr = (struct ether_hdr *)rte_pktmbuf_prepend(m, hdr_len);
@@ -2109,7 +2109,7 @@ create_packet(struct rte_mempool *pktmbuf_pool,
GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
if (rte_pktmbuf_pkt_len(pkt_seg) != 0)
GOTO_FAIL("%s: Bad packet length\n", __func__);
- rte_mbuf_sanity_check(pkt_seg, 0);
+ rte_mbuf_verify(pkt_seg, 0);
/* Add header only for the first segment */
if (test_data->flags == MBUF_HEADER && seg == 0) {
hdr_len = sizeof(struct rte_ether_hdr);
@@ -2321,7 +2321,7 @@ test_pktmbuf_ext_shinfo_init_helper(struct rte_mempool *pktmbuf_pool)
GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("%s: Bad packet length\n", __func__);
- rte_mbuf_sanity_check(m, 0);
+ rte_mbuf_verify(m, 0);
ext_buf_addr = rte_malloc("External buffer", buf_len,
RTE_CACHE_LINE_SIZE);
@@ -2482,8 +2482,8 @@ test_pktmbuf_ext_pinned_buffer(struct rte_mempool *std_pool)
GOTO_FAIL("%s: test_pktmbuf_copy(pinned) failed\n",
__func__);
- if (test_failing_mbuf_sanity_check(pinned_pool) < 0)
- GOTO_FAIL("%s: test_failing_mbuf_sanity_check(pinned)"
+ if (test_failing_mbuf_verify(pinned_pool) < 0)
+ GOTO_FAIL("%s: test_failing_mbuf_verify(pinned)"
" failed\n", __func__);
if (test_mbuf_linearize_check(pinned_pool) < 0)
@@ -2857,8 +2857,8 @@ test_mbuf(void)
goto err;
}
- if (test_failing_mbuf_sanity_check(pktmbuf_pool) < 0) {
- printf("test_failing_mbuf_sanity_check() failed\n");
+ if (test_failing_mbuf_verify(pktmbuf_pool) < 0) {
+ printf("test_failing_mbuf_verify() failed\n");
goto err;
}
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 049357c755..0accb51a98 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -266,8 +266,8 @@ can be found in several of the sample applications, for example, the IPv4 Multic
Debug
-----
-In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption,
-bad type, and so on).
+In debug mode, the functions of the mbuf library perform consistency checks
+before any operation (such as, buffer corruption, bad type, and so on).
Use Cases
---------
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..6b4a3102ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,6 @@ Deprecation Notices
will be deprecated and subsequently removed in DPDK 24.11 release.
Before this, the new port library API (functions rte_swx_port_*)
will gradually transition from experimental to stable status.
+
+* mbuf: The function ``rte_mbuf_sanity_check`` is deprecated.
+ Use the new function ``rte_mbuf_verify`` instead.
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index 6733462c86..bafc08fd60 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -1231,7 +1231,7 @@ _avp_mac_filter(struct avp_dev *avp, struct rte_mbuf *m)
#ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
static inline void
-__avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
+__avp_dev_buffer_check(struct avp_dev *avp, struct rte_avp_desc *buf)
{
struct rte_avp_desc *first_buf;
struct rte_avp_desc *pkt_buf;
@@ -1272,12 +1272,12 @@ __avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
first_buf->pkt_len, pkt_len);
}
-#define avp_dev_buffer_sanity_check(a, b) \
- __avp_dev_buffer_sanity_check((a), (b))
+#define avp_dev_buffer_check(a, b) \
+ __avp_dev_buffer_check((a), (b))
#else /* RTE_LIBRTE_AVP_DEBUG_BUFFERS */
-#define avp_dev_buffer_sanity_check(a, b) do {} while (0)
+#define avp_dev_buffer_check(a, b) do {} while (0)
#endif
@@ -1302,7 +1302,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
void *pkt_data;
unsigned int i;
- avp_dev_buffer_sanity_check(avp, buf);
+ avp_dev_buffer_check(avp, buf);
/* setup the first source buffer */
pkt_buf = avp_dev_translate_buffer(avp, buf);
@@ -1370,7 +1370,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
rte_pktmbuf_pkt_len(m) = total_length;
m->vlan_tci = vlan_tci;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
return m;
}
@@ -1614,7 +1614,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
char *pkt_data;
unsigned int i;
- __rte_mbuf_sanity_check(mbuf, 1);
+ __rte_mbuf_verify(mbuf, 1);
m = mbuf;
src_offset = 0;
@@ -1680,7 +1680,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
first_buf->vlan_tci = mbuf->vlan_tci;
}
- avp_dev_buffer_sanity_check(avp, buffers[0]);
+ avp_dev_buffer_check(avp, buffers[0]);
return total_length;
}
@@ -1798,7 +1798,7 @@ avp_xmit_scattered_pkts(void *tx_queue,
#ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
for (i = 0; i < nb_pkts; i++)
- avp_dev_buffer_sanity_check(avp, tx_bufs[i]);
+ avp_dev_buffer_check(avp, tx_bufs[i]);
#endif
/* send the packets */
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index e283879e6b..5ebfba4dcf 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -179,7 +179,7 @@ sfc_ef100_rx_qrefill(struct sfc_ef100_rxq *rxq)
struct sfc_ef100_rx_sw_desc *rxd;
rte_iova_t dma_addr;
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
dma_addr = rte_mbuf_data_iova_default(m);
if (rxq->flags & SFC_EF100_RXQ_NIC_DMA_MAP) {
@@ -551,7 +551,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
rxq->ready_pkts--;
pkt = sfc_ef100_rx_next_mbuf(rxq);
- __rte_mbuf_raw_sanity_check(pkt);
+ __rte_mbuf_raw_verify(pkt);
RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data[0]) !=
sizeof(rxq->rearm_data));
@@ -575,7 +575,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
struct rte_mbuf *seg;
seg = sfc_ef100_rx_next_mbuf(rxq);
- __rte_mbuf_raw_sanity_check(seg);
+ __rte_mbuf_raw_verify(seg);
seg->data_off = RTE_PKTMBUF_HEADROOM;
diff --git a/drivers/net/sfc/sfc_ef10_essb_rx.c b/drivers/net/sfc/sfc_ef10_essb_rx.c
index 78bd430363..74647e2792 100644
--- a/drivers/net/sfc/sfc_ef10_essb_rx.c
+++ b/drivers/net/sfc/sfc_ef10_essb_rx.c
@@ -125,7 +125,7 @@ sfc_ef10_essb_next_mbuf(const struct sfc_ef10_essb_rxq *rxq,
struct rte_mbuf *m;
m = (struct rte_mbuf *)((uintptr_t)mbuf + rxq->buf_stride);
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
return m;
}
@@ -136,7 +136,7 @@ sfc_ef10_essb_mbuf_by_index(const struct sfc_ef10_essb_rxq *rxq,
struct rte_mbuf *m;
m = (struct rte_mbuf *)((uintptr_t)mbuf + idx * rxq->buf_stride);
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
return m;
}
diff --git a/drivers/net/sfc/sfc_ef10_rx.c b/drivers/net/sfc/sfc_ef10_rx.c
index 60442930b3..f4fc815570 100644
--- a/drivers/net/sfc/sfc_ef10_rx.c
+++ b/drivers/net/sfc/sfc_ef10_rx.c
@@ -148,7 +148,7 @@ sfc_ef10_rx_qrefill(struct sfc_ef10_rxq *rxq)
struct sfc_ef10_rx_sw_desc *rxd;
rte_iova_t phys_addr;
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
SFC_ASSERT((id & ~ptr_mask) == 0);
rxd = &rxq->sw_ring[id];
@@ -297,7 +297,7 @@ sfc_ef10_rx_process_event(struct sfc_ef10_rxq *rxq, efx_qword_t rx_ev,
rxd = &rxq->sw_ring[pending++ & ptr_mask];
m = rxd->mbuf;
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
m->data_off = RTE_PKTMBUF_HEADROOM;
rte_pktmbuf_data_len(m) = seg_len;
diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index a193229265..c885ce2b05 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -120,7 +120,7 @@ sfc_efx_rx_qrefill(struct sfc_efx_rxq *rxq)
++i, id = (id + 1) & rxq->ptr_mask) {
m = objs[i];
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
rxd = &rxq->sw_desc[id];
rxd->mbuf = m;
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 1eed645d02..3bfab37012 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -258,7 +258,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
hdr->pkt_len = (uint16_t)(hdr->data_len + pkt->pkt_len);
hdr->nb_segs = pkt->nb_segs + 1;
- __rte_mbuf_sanity_check(hdr, 1);
+ __rte_mbuf_verify(hdr, 1);
return hdr;
}
/* >8 End of mcast_out_kt. */
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 559d5ad8a7..fc5d4ba29d 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -367,9 +367,9 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
return mp;
}
-/* do some sanity checks on a mbuf: panic if it fails */
+/* do some checks on a mbuf: panic if it fails */
void
-rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header)
{
const char *reason;
@@ -377,6 +377,13 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
rte_panic("%s\n", reason);
}
+/* For ABI compatibility, to be removed in next release */
+void
+rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+{
+ rte_mbuf_verify(m, is_header);
+}
+
int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
const char **reason)
{
@@ -496,7 +503,7 @@ void rte_pktmbuf_free_bulk(struct rte_mbuf **mbufs, unsigned int count)
if (unlikely(m == NULL))
continue;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
do {
m_next = m->next;
@@ -546,7 +553,7 @@ rte_pktmbuf_clone(struct rte_mbuf *md, struct rte_mempool *mp)
return NULL;
}
- __rte_mbuf_sanity_check(mc, 1);
+ __rte_mbuf_verify(mc, 1);
return mc;
}
@@ -596,7 +603,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
struct rte_mbuf *mc, *m_last, **prev;
/* garbage in check */
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
/* check for request to copy at offset past end of mbuf */
if (unlikely(off >= m->pkt_len))
@@ -660,7 +667,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
}
/* garbage out check */
- __rte_mbuf_sanity_check(mc, 1);
+ __rte_mbuf_verify(mc, 1);
return mc;
}
@@ -671,7 +678,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
unsigned int len;
unsigned int nb_segs;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
fprintf(f, "dump mbuf at %p, iova=%#" PRIx64 ", buf_len=%u\n", m, rte_mbuf_iova_get(m),
m->buf_len);
@@ -689,7 +696,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
nb_segs = m->nb_segs;
while (m && nb_segs != 0) {
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(m, 0);
fprintf(f, " segment at %p, data=%p, len=%u, off=%u, refcnt=%u\n",
m, rte_pktmbuf_mtod(m, void *),
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b788..380663a089 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -339,13 +339,13 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
#ifdef RTE_LIBRTE_MBUF_DEBUG
-/** check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
+/** do mbuf type in debug mode */
+#define __rte_mbuf_verify(m, is_h) rte_mbuf_verify(m, is_h)
#else /* RTE_LIBRTE_MBUF_DEBUG */
-/** check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
+/** ignore mbuf checks if not in debug mode */
+#define __rte_mbuf_verify(m, is_h) do { } while (0)
#endif /* RTE_LIBRTE_MBUF_DEBUG */
@@ -514,10 +514,9 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
/**
- * Sanity checks on an mbuf.
+ * Check that the mbuf is valid and panic if corrupted.
*
- * Check the consistency of the given mbuf. The function will cause a
- * panic if corruption is detected.
+ * Acts assertion that mbuf is consistent. If not it calls rte_panic().
*
* @param m
* The mbuf to be checked.
@@ -526,13 +525,17 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
* of a packet (in this case, some fields like nb_segs are not checked)
*/
void
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header);
+
+/* Older deprecated name for rte_mbuf_verify() */
+void __rte_deprecated
rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
/**
- * Sanity checks on a mbuf.
+ * Do consistency checks on a mbuf.
*
- * Almost like rte_mbuf_sanity_check(), but this function gives the reason
- * if corruption is detected rather than panic.
+ * Check the consistency of the given mbuf and if not valid
+ * return the reason.
*
* @param m
* The mbuf to be checked.
@@ -551,7 +554,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
const char **reason);
/**
- * Sanity checks on a reinitialized mbuf in debug mode.
+ * Do checks on a reinitialized mbuf in debug mode.
*
* Check the consistency of the given reinitialized mbuf.
* The function will cause a panic if corruption is detected.
@@ -563,16 +566,16 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
* The mbuf to be checked.
*/
static __rte_always_inline void
-__rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
+__rte_mbuf_raw_verify(__rte_unused const struct rte_mbuf *m)
{
RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
RTE_ASSERT(m->next == NULL);
RTE_ASSERT(m->nb_segs == 1);
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(m, 0);
}
/** For backwards compatibility. */
-#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check(m)
+#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_verify(m)
/**
* Allocate an uninitialized mbuf from mempool *mp*.
@@ -599,7 +602,7 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
if (rte_mempool_get(mp, (void **)&m) < 0)
return NULL;
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
return m;
}
@@ -622,7 +625,7 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
{
RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
(!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_verify(m);
rte_mempool_put(m->pool, m);
}
@@ -885,7 +888,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
rte_pktmbuf_reset_headroom(m);
m->data_len = 0;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
}
/**
@@ -941,22 +944,22 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
switch (count % 4) {
case 0:
while (idx != count) {
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_verify(mbufs[idx]);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_verify(mbufs[idx]);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_verify(mbufs[idx]);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_verify(mbufs[idx]);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
@@ -1184,8 +1187,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
mi->pkt_len = mi->data_len;
mi->nb_segs = 1;
- __rte_mbuf_sanity_check(mi, 1);
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(mi, 1);
+ __rte_mbuf_verify(m, 0);
}
/**
@@ -1340,7 +1343,7 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
static __rte_always_inline struct rte_mbuf *
rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(m, 0);
if (likely(rte_mbuf_refcnt_read(m) == 1)) {
@@ -1411,7 +1414,7 @@ static inline void rte_pktmbuf_free(struct rte_mbuf *m)
struct rte_mbuf *m_next;
if (m != NULL)
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
while (m != NULL) {
m_next = m->next;
@@ -1492,7 +1495,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
*/
static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
{
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
do {
rte_mbuf_refcnt_update(m, v);
@@ -1509,7 +1512,7 @@ static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
*/
static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
{
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(m, 0);
return m->data_off;
}
@@ -1523,7 +1526,7 @@ static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
*/
static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
{
- __rte_mbuf_sanity_check(m, 0);
+ __rte_mbuf_verify(m, 0);
return (uint16_t)(m->buf_len - rte_pktmbuf_headroom(m) -
m->data_len);
}
@@ -1538,7 +1541,7 @@ static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
*/
static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
{
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
while (m->next != NULL)
m = m->next;
return m;
@@ -1582,7 +1585,7 @@ static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
static inline char *rte_pktmbuf_prepend(struct rte_mbuf *m,
uint16_t len)
{
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
if (unlikely(len > rte_pktmbuf_headroom(m)))
return NULL;
@@ -1617,7 +1620,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
void *tail;
struct rte_mbuf *m_last;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
m_last = rte_pktmbuf_lastseg(m);
if (unlikely(len > rte_pktmbuf_tailroom(m_last)))
@@ -1645,7 +1648,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
*/
static inline char *rte_pktmbuf_adj(struct rte_mbuf *m, uint16_t len)
{
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
if (unlikely(len > m->data_len))
return NULL;
@@ -1677,7 +1680,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
{
struct rte_mbuf *m_last;
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
m_last = rte_pktmbuf_lastseg(m);
if (unlikely(len > m_last->data_len))
@@ -1699,7 +1702,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
*/
static inline int rte_pktmbuf_is_contiguous(const struct rte_mbuf *m)
{
- __rte_mbuf_sanity_check(m, 1);
+ __rte_mbuf_verify(m, 1);
return m->nb_segs == 1;
}
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index daa65e2bbd..c85370e430 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -31,6 +31,7 @@ DPDK_24 {
rte_mbuf_set_platform_mempool_ops;
rte_mbuf_set_user_mempool_ops;
rte_mbuf_user_mempool_ops;
+ rte_mbuf_verify;
rte_pktmbuf_clone;
rte_pktmbuf_copy;
rte_pktmbuf_dump;
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: [PATCH v5] graph: expose node context as pointers
2024-03-27 9:14 4% ` [PATCH v5] " Robin Jarry
@ 2024-05-29 17:54 0% ` Nithin Dabilpuram
2024-06-18 12:33 4% ` David Marchand
1 sibling, 0 replies; 200+ results
From: Nithin Dabilpuram @ 2024-05-29 17:54 UTC (permalink / raw)
To: Robin Jarry
Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
Tyler Retzlaff
Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
On Wed, Mar 27, 2024 at 2:47 PM Robin Jarry <rjarry@redhat.com> wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. And without
> intermediate structures, casting to opaque pointers is hard without
> violating strict aliasing rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
> preserve existing alignment on architectures where cache lines are 128
> bytes.
>
> Add a static assert to ensure that the unnamed union is not larger than
> the context array (RTE_NODE_CTX_SZ).
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
>
> Notes:
> v5:
>
> * Helper functions to hide casting proved to be harder than expected.
> Naive casting may even be impossible without breaking strict aliasing
> rules. The only other option would be to use explicit memcpy calls.
> * Unnamed union tentative again. As suggested by Tyler (thank you!),
> using an intermediate unnamed struct to carry the alignment produces
> consistent ABI in C and C++.
> * Also, Tyler (thank you!) suggested that the fast path area alignment
> size may be incorrect for architectures where the cache line is not 64
> bytes. There will be a 64 bytes hole in the structure at the end of
> the unnamed struct before the zero length next nodes array. Use
> __rte_cache_min_aligned to preserve existing alignment.
>
> v4:
>
> * Replaced the unnamed union with helper inline functions.
>
> v3:
>
> * Added __extension__ to the unnamed struct inside the union.
> * Fixed C++ header checks.
> * Replaced alignas() with an explicit static_assert.
>
> lib/graph/rte_graph_worker_common.h | 27 ++++++++++++++++++++-------
> 1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..84d4997bbbf6 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -12,7 +12,9 @@
> * process, enqueue and move streams of objects to the next nodes.
> */
>
> +#include <assert.h>
> #include <stdalign.h>
> +#include <stddef.h>
>
> #include <rte_common.h>
> #include <rte_cycles.h>
> @@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
> } dispatch;
> };
> /* Fast path area */
> + __extension__ struct __rte_cache_min_aligned {
> #define RTE_NODE_CTX_SZ 16
> - alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
> - uint16_t size; /**< Total number of objects available. */
> - uint16_t idx; /**< Number of objects used. */
> - rte_graph_off_t off; /**< Offset of node in the graph reel. */
> - uint64_t total_cycles; /**< Cycles spent in this node. */
> - uint64_t total_calls; /**< Calls done to this node. */
> - uint64_t total_objs; /**< Objects processed by this node. */
> + union {
> + uint8_t ctx[RTE_NODE_CTX_SZ];
> + __extension__ struct {
> + void *ctx_ptr;
> + void *ctx_ptr2;
> + };
> + }; /**< Node Context. */
> + uint16_t size; /**< Total number of objects available. */
> + uint16_t idx; /**< Number of objects used. */
> + rte_graph_off_t off; /**< Offset of node in the graph reel. */
> + uint64_t total_cycles; /**< Cycles spent in this node. */
> + uint64_t total_calls; /**< Calls done to this node. */
> + uint64_t total_objs; /**< Objects processed by this node. */
> union {
> void **objs; /**< Array of object pointers. */
> uint64_t objs_u64;
> @@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
> rte_node_process_t process; /**< Process function. */
> uint64_t process_u64;
> };
> + };
> alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
> };
>
> +static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, ctx) == RTE_NODE_CTX_SZ,
> + "rte_node context must be RTE_NODE_CTX_SZ bytes exactly");
> +
> /**
> * @internal
> *
> --
> 2.44.0
>
^ permalink raw reply [relevance 0%]
* RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
2024-04-12 11:57 3% ` [PATCH v2 " Akhil Goyal
@ 2024-05-29 10:43 0% ` Anoob Joseph
2024-05-30 9:19 0% ` Akhil Goyal
0 siblings, 1 reply; 200+ results
From: Anoob Joseph @ 2024-05-29 10:43 UTC (permalink / raw)
To: Akhil Goyal, dev
Cc: thomas, david.marchand, hemant.agrawal, pablo.de.lara.guarch,
fiona.trahe, declan.doherty, matan, g.singh, fanzhang.oss,
jianjay.zhou, asomalap, ruifeng.wang, konstantin.v.ananyev,
radu.nicolau, ajit.khaparde, Nagadheeraj Rottela, ciara.power,
Akhil Goyal
>
> Added a new fast path API to get the number of used crypto device queue pair
> depth at any given point.
>
> An implementation in cnxk crypto driver is also added along with a test case in
> test app.
>
> The addition of new API causes an ABI warning.
> This is suppressed as the updated struct rte_crypto_fp_ops is an internal
> structure and not to be used by application directly.
>
Series Acked-by: Anoob Joseph <anoobj@marvell.com>
^ permalink raw reply [relevance 0%]
* [PATCH v15 06/11] net/tap: rewrite the RSS BPF program
2024-05-21 20:12 2% ` [PATCH v15 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21 20:12 2% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 20:12 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 267 +++++++++++++++++++++++++
9 files changed, 397 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a target architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ const __u32 *key;
+ __be16 proto;
+ __u32 mark;
+ __u32 hash;
+ __u16 queue;
+
+ __builtin_preserve_access_index(({
+ mark = skb->mark;
+ proto = skb->protocol;
+ }));
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ key = (const __u32 *)rsskey->key;
+
+ if (proto == bpf_htons(ETH_P_IP))
+ hash = parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (proto == bpf_htons(ETH_P_IPV6))
+ hash = parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ hash = 0;
+
+ if (hash == 0)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+ __builtin_preserve_access_index(({
+ skb->queue_mapping = queue;
+ }));
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v15 02/11] net/tap: do not duplicate fd's
@ 2024-05-21 20:12 2% ` Stephen Hemminger
2024-05-21 20:12 2% ` [PATCH v15 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 20:12 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.
MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).
The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/rel_notes/release_24_07.rst | 4 +
drivers/net/tap/rte_eth_tap.c | 192 ++++++++++---------------
drivers/net/tap/rte_eth_tap.h | 3 +-
drivers/net/tap/tap_flow.c | 3 +-
drivers/net/tap/tap_intr.c | 7 +-
5 files changed, 89 insertions(+), 120 deletions(-)
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..a6295359b1 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Update Tap PMD driver.**
+
+ * Updated to support up to 8 queues when used by secondary process.
+
Removed Items
-------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
- int rxq_count;
- int txq_count;
+ int q_count;
/*
* The file descriptors are in the dedicated part
* of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint16_t data_off = rte_pktmbuf_headroom(mbuf);
int len;
- len = readv(process_private->rxq_fds[rxq->queue_id],
+ len = readv(process_private->fds[rxq->queue_id],
*rxq->iovecs,
1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
}
/* copy the tx frame data */
- n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+ n = writev(process_private->fds[txq->queue_id], iovecs, k);
if (n <= 0)
return -1;
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
struct rte_mp_msg msg;
struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
int err;
- int fd_iterator = 0;
struct pmd_process_private *process_private = dev->process_private;
int i;
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
- for (i = 0; i < dev->data->nb_tx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->txq_fds[i];
- msg.num_fds++;
- request_param->txq_count++;
- }
- for (i = 0; i < dev->data->nb_rx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->rxq_fds[i];
- msg.num_fds++;
- request_param->rxq_count++;
- }
+
+ /* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+ for (i = 0; i < dev->data->nb_rx_queues; i++)
+ msg.fds[i] = process_private->fds[i];
+
+ request_param->q_count = dev->data->nb_rx_queues;
+ msg.num_fds = dev->data->nb_rx_queues;
err = rte_mp_sendmsg(&msg);
if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
struct rte_eth_dev *dev;
const struct ipc_queues *request_param =
(const struct ipc_queues *)request->param;
- int fd_iterator;
- int queue;
struct pmd_process_private *process_private;
dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
request_param->port_name);
return -1;
}
+
process_private = dev->process_private;
- fd_iterator = 0;
- TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
- request_param->txq_count);
- for (queue = 0; queue < request_param->txq_count; queue++)
- process_private->txq_fds[queue] = request->fds[fd_iterator++];
- for (queue = 0; queue < request_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+ TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+ for (int q = 0; q < request_param->q_count; q++)
+ process_private->fds[q] = request->fds[q];
+
return 0;
}
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
return 0;
}
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+ if (process_private->fds[qid] != -1) {
+ close(process_private->fds[qid]);
+ process_private->fds[qid] = -1;
+ }
+}
+
static int
tap_dev_close(struct rte_eth_dev *dev)
{
int i;
struct pmd_internals *internals = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
- struct rx_queue *rxq;
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
}
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- if (process_private->rxq_fds[i] != -1) {
- rxq = &internals->rxq[i];
- close(process_private->rxq_fds[i]);
- process_private->rxq_fds[i] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
- if (process_private->txq_fds[i] != -1) {
- close(process_private->txq_fds[i]);
- process_private->txq_fds[i] = -1;
- }
+ struct rx_queue *rxq = &internals->rxq[i];
+
+ tap_queue_close(process_private, i);
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
}
if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!rxq)
return;
+
process_private = rte_eth_devices[rxq->in_port].process_private;
- if (process_private->rxq_fds[rxq->queue_id] != -1) {
- close(process_private->rxq_fds[rxq->queue_id]);
- process_private->rxq_fds[rxq->queue_id] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
+
+ if (dev->data->tx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!txq)
return;
- process_private = rte_eth_devices[txq->out_port].process_private;
- if (process_private->txq_fds[txq->queue_id] != -1) {
- close(process_private->txq_fds[txq->queue_id]);
- process_private->txq_fds[txq->queue_id] = -1;
- }
+ process_private = rte_eth_devices[txq->out_port].process_private;
+ if (dev->data->rx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
uint16_t qid,
int is_rx)
{
- int ret;
- int *fd;
- int *other_fd;
- const char *dir;
+ int fd, ret;
struct pmd_internals *pmd = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
struct rx_queue *rx = &internals->rxq[qid];
struct tx_queue *tx = &internals->txq[qid];
- struct rte_gso_ctx *gso_ctx;
+ struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+ const char *dir = is_rx ? "rx" : "tx";
- if (is_rx) {
- fd = &process_private->rxq_fds[qid];
- other_fd = &process_private->txq_fds[qid];
- dir = "rx";
- gso_ctx = NULL;
- } else {
- fd = &process_private->txq_fds[qid];
- other_fd = &process_private->rxq_fds[qid];
- dir = "tx";
- gso_ctx = &tx->gso_ctx;
- }
- if (*fd != -1) {
+ fd = process_private->fds[qid];
+ if (fd != -1) {
/* fd for this queue already exists */
TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
- pmd->name, *fd, dir, qid);
+ pmd->name, fd, dir, qid);
gso_ctx = NULL;
- } else if (*other_fd != -1) {
- /* Only other_fd exists. dup it */
- *fd = dup(*other_fd);
- if (*fd < 0) {
- *fd = -1;
- TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
- return -1;
- }
- TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
- pmd->name, *other_fd, dir, qid, *fd);
} else {
- /* Both RX and TX fds do not exist (equal -1). Create fd */
- *fd = tun_alloc(pmd, 0, 0);
- if (*fd < 0) {
- *fd = -1; /* restore original value */
+ fd = tun_alloc(pmd, 0, 0);
+ if (fd < 0) {
TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
return -1;
}
+
TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
- pmd->name, dir, qid, *fd);
+ pmd->name, dir, qid, fd);
+
+ process_private->fds[qid] = fd;
}
tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
tx->type = pmd->type;
- return *fd;
+ return fd;
}
static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG, " RX TUNTAP device name %s, qid %d on fd %d",
internals->name, rx_queue_id,
- process_private->rxq_fds[rx_queue_id]);
+ process_private->fds[rx_queue_id]);
return 0;
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG,
" TX TUNTAP device name %s, qid %d on fd %d csum %s",
internals->name, tx_queue_id,
- process_private->txq_fds[tx_queue_id],
+ process_private->fds[tx_queue_id],
txq->csum ? "on" : "off");
return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
dev->intr_handle = pmd->intr_handle;
/* Presetup the fds to -1 as being not valid */
- for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- process_private->rxq_fds[i] = -1;
- process_private->txq_fds[i] = -1;
- }
+ for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+ process_private->fds[i] = -1;
+
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
struct ipc_queues *request_param = (struct ipc_queues *)request.param;
struct ipc_queues *reply_param;
struct pmd_process_private *process_private = dev->process_private;
- int queue, fd_iterator;
/* Prepare the request */
memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
/* Attach the queues from received file descriptors */
- if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+ if (reply_param->q_count != reply->num_fds) {
TAP_LOG(ERR, "Unexpected number of fds received");
return -1;
}
- dev->data->nb_rx_queues = reply_param->rxq_count;
- dev->data->nb_tx_queues = reply_param->txq_count;
- fd_iterator = 0;
- for (queue = 0; queue < reply_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
- for (queue = 0; queue < reply_param->txq_count; queue++)
- process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+ dev->data->nb_rx_queues = reply_param->q_count;
+ dev->data->nb_tx_queues = reply_param->q_count;
+
+ for (int q = 0; q < reply_param->q_count; q++)
+ process_private->fds[q] = reply->fds[q];
+
free(reply);
return 0;
}
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
/* Fill file descriptors for all queues */
reply.num_fds = 0;
- reply_param->rxq_count = 0;
- if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
- RTE_MP_MAX_FD_NUM){
- TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+ reply_param->q_count = 0;
+
+ RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+ if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+ TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+ dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
return -1;
}
for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
- reply_param->rxq_count++;
- }
- RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
- reply_param->txq_count = 0;
- for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
- reply_param->txq_count++;
+ reply.fds[reply.num_fds++] = process_private->fds[queue];
+ reply_param->q_count++;
}
- RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
/* Send reply */
strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
};
struct pmd_process_private {
- int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
- int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+ int fds[RTE_PMD_TAP_MAX_QUEUES];
};
/* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
* If netdevice is there, setup appropriate flow rules immediately.
* Otherwise it will be set when bringing up the netdevice (tun_alloc).
*/
- if (process_private->rxq_fds[0] == -1)
+ if (process_private->fds[0] == -1)
return 0;
+
if (set) {
struct rte_flow *remote_flow;
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
}
for (i = 0; i < n; i++) {
struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+ int fd = process_private->fds[i];
/* Skip queues that cannot request interrupts. */
- if (!rxq || process_private->rxq_fds[i] == -1) {
+ if (!rxq || fd == -1) {
+ /* Use invalid intr_vec[] index to disable entry. */
/* Use invalid intr_vec[] index to disable entry. */
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + count))
return -rte_errno;
- if (rte_intr_efds_index_set(intr_handle, count,
- process_private->rxq_fds[i]))
+ if (rte_intr_efds_index_set(intr_handle, count, fd))
return -rte_errno;
count++;
}
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v14 06/11] net/tap: rewrite the RSS BPF program
2024-05-21 17:06 2% ` [PATCH v14 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21 17:06 2% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 17:06 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 267 +++++++++++++++++++++++++
9 files changed, 397 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a target architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ const __u32 *key;
+ __be16 proto;
+ __u32 mark;
+ __u32 hash;
+ __u16 queue;
+
+ __builtin_preserve_access_index(({
+ mark = skb->mark;
+ proto = skb->protocol;
+ }));
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ key = (const __u32 *)rsskey->key;
+
+ if (proto == bpf_htons(ETH_P_IP))
+ hash = parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (proto == bpf_htons(ETH_P_IPV6))
+ hash = parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ hash = 0;
+
+ if (hash == 0)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+ __builtin_preserve_access_index(({
+ skb->queue_mapping = queue;
+ }));
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v14 02/11] net/tap: do not duplicate fd's
@ 2024-05-21 17:06 2% ` Stephen Hemminger
2024-05-21 17:06 2% ` [PATCH v14 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 17:06 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.
MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).
The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/rel_notes/release_24_07.rst | 4 +
drivers/net/tap/rte_eth_tap.c | 192 ++++++++++---------------
drivers/net/tap/rte_eth_tap.h | 3 +-
drivers/net/tap/tap_flow.c | 3 +-
drivers/net/tap/tap_intr.c | 7 +-
5 files changed, 89 insertions(+), 120 deletions(-)
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..fa9692924b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Update Tap PMD driver.
+
+ * Updated to support up to 8 queues when used by secondary process.
+
Removed Items
-------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
- int rxq_count;
- int txq_count;
+ int q_count;
/*
* The file descriptors are in the dedicated part
* of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint16_t data_off = rte_pktmbuf_headroom(mbuf);
int len;
- len = readv(process_private->rxq_fds[rxq->queue_id],
+ len = readv(process_private->fds[rxq->queue_id],
*rxq->iovecs,
1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
}
/* copy the tx frame data */
- n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+ n = writev(process_private->fds[txq->queue_id], iovecs, k);
if (n <= 0)
return -1;
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
struct rte_mp_msg msg;
struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
int err;
- int fd_iterator = 0;
struct pmd_process_private *process_private = dev->process_private;
int i;
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
- for (i = 0; i < dev->data->nb_tx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->txq_fds[i];
- msg.num_fds++;
- request_param->txq_count++;
- }
- for (i = 0; i < dev->data->nb_rx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->rxq_fds[i];
- msg.num_fds++;
- request_param->rxq_count++;
- }
+
+ /* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+ for (i = 0; i < dev->data->nb_rx_queues; i++)
+ msg.fds[i] = process_private->fds[i];
+
+ request_param->q_count = dev->data->nb_rx_queues;
+ msg.num_fds = dev->data->nb_rx_queues;
err = rte_mp_sendmsg(&msg);
if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
struct rte_eth_dev *dev;
const struct ipc_queues *request_param =
(const struct ipc_queues *)request->param;
- int fd_iterator;
- int queue;
struct pmd_process_private *process_private;
dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
request_param->port_name);
return -1;
}
+
process_private = dev->process_private;
- fd_iterator = 0;
- TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
- request_param->txq_count);
- for (queue = 0; queue < request_param->txq_count; queue++)
- process_private->txq_fds[queue] = request->fds[fd_iterator++];
- for (queue = 0; queue < request_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+ TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+ for (int q = 0; q < request_param->q_count; q++)
+ process_private->fds[q] = request->fds[q];
+
return 0;
}
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
return 0;
}
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+ if (process_private->fds[qid] != -1) {
+ close(process_private->fds[qid]);
+ process_private->fds[qid] = -1;
+ }
+}
+
static int
tap_dev_close(struct rte_eth_dev *dev)
{
int i;
struct pmd_internals *internals = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
- struct rx_queue *rxq;
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
}
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- if (process_private->rxq_fds[i] != -1) {
- rxq = &internals->rxq[i];
- close(process_private->rxq_fds[i]);
- process_private->rxq_fds[i] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
- if (process_private->txq_fds[i] != -1) {
- close(process_private->txq_fds[i]);
- process_private->txq_fds[i] = -1;
- }
+ struct rx_queue *rxq = &internals->rxq[i];
+
+ tap_queue_close(process_private, i);
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
}
if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!rxq)
return;
+
process_private = rte_eth_devices[rxq->in_port].process_private;
- if (process_private->rxq_fds[rxq->queue_id] != -1) {
- close(process_private->rxq_fds[rxq->queue_id]);
- process_private->rxq_fds[rxq->queue_id] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
+
+ if (dev->data->tx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!txq)
return;
- process_private = rte_eth_devices[txq->out_port].process_private;
- if (process_private->txq_fds[txq->queue_id] != -1) {
- close(process_private->txq_fds[txq->queue_id]);
- process_private->txq_fds[txq->queue_id] = -1;
- }
+ process_private = rte_eth_devices[txq->out_port].process_private;
+ if (dev->data->rx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
uint16_t qid,
int is_rx)
{
- int ret;
- int *fd;
- int *other_fd;
- const char *dir;
+ int fd, ret;
struct pmd_internals *pmd = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
struct rx_queue *rx = &internals->rxq[qid];
struct tx_queue *tx = &internals->txq[qid];
- struct rte_gso_ctx *gso_ctx;
+ struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+ const char *dir = is_rx ? "rx" : "tx";
- if (is_rx) {
- fd = &process_private->rxq_fds[qid];
- other_fd = &process_private->txq_fds[qid];
- dir = "rx";
- gso_ctx = NULL;
- } else {
- fd = &process_private->txq_fds[qid];
- other_fd = &process_private->rxq_fds[qid];
- dir = "tx";
- gso_ctx = &tx->gso_ctx;
- }
- if (*fd != -1) {
+ fd = process_private->fds[qid];
+ if (fd != -1) {
/* fd for this queue already exists */
TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
- pmd->name, *fd, dir, qid);
+ pmd->name, fd, dir, qid);
gso_ctx = NULL;
- } else if (*other_fd != -1) {
- /* Only other_fd exists. dup it */
- *fd = dup(*other_fd);
- if (*fd < 0) {
- *fd = -1;
- TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
- return -1;
- }
- TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
- pmd->name, *other_fd, dir, qid, *fd);
} else {
- /* Both RX and TX fds do not exist (equal -1). Create fd */
- *fd = tun_alloc(pmd, 0, 0);
- if (*fd < 0) {
- *fd = -1; /* restore original value */
+ fd = tun_alloc(pmd, 0, 0);
+ if (fd < 0) {
TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
return -1;
}
+
TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
- pmd->name, dir, qid, *fd);
+ pmd->name, dir, qid, fd);
+
+ process_private->fds[qid] = fd;
}
tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
tx->type = pmd->type;
- return *fd;
+ return fd;
}
static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG, " RX TUNTAP device name %s, qid %d on fd %d",
internals->name, rx_queue_id,
- process_private->rxq_fds[rx_queue_id]);
+ process_private->fds[rx_queue_id]);
return 0;
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG,
" TX TUNTAP device name %s, qid %d on fd %d csum %s",
internals->name, tx_queue_id,
- process_private->txq_fds[tx_queue_id],
+ process_private->fds[tx_queue_id],
txq->csum ? "on" : "off");
return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
dev->intr_handle = pmd->intr_handle;
/* Presetup the fds to -1 as being not valid */
- for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- process_private->rxq_fds[i] = -1;
- process_private->txq_fds[i] = -1;
- }
+ for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+ process_private->fds[i] = -1;
+
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
struct ipc_queues *request_param = (struct ipc_queues *)request.param;
struct ipc_queues *reply_param;
struct pmd_process_private *process_private = dev->process_private;
- int queue, fd_iterator;
/* Prepare the request */
memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
/* Attach the queues from received file descriptors */
- if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+ if (reply_param->q_count != reply->num_fds) {
TAP_LOG(ERR, "Unexpected number of fds received");
return -1;
}
- dev->data->nb_rx_queues = reply_param->rxq_count;
- dev->data->nb_tx_queues = reply_param->txq_count;
- fd_iterator = 0;
- for (queue = 0; queue < reply_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
- for (queue = 0; queue < reply_param->txq_count; queue++)
- process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+ dev->data->nb_rx_queues = reply_param->q_count;
+ dev->data->nb_tx_queues = reply_param->q_count;
+
+ for (int q = 0; q < reply_param->q_count; q++)
+ process_private->fds[q] = reply->fds[q];
+
free(reply);
return 0;
}
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
/* Fill file descriptors for all queues */
reply.num_fds = 0;
- reply_param->rxq_count = 0;
- if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
- RTE_MP_MAX_FD_NUM){
- TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+ reply_param->q_count = 0;
+
+ RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+ if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+ TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+ dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
return -1;
}
for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
- reply_param->rxq_count++;
- }
- RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
- reply_param->txq_count = 0;
- for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
- reply_param->txq_count++;
+ reply.fds[reply.num_fds++] = process_private->fds[queue];
+ reply_param->q_count++;
}
- RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
/* Send reply */
strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
};
struct pmd_process_private {
- int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
- int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+ int fds[RTE_PMD_TAP_MAX_QUEUES];
};
/* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
* If netdevice is there, setup appropriate flow rules immediately.
* Otherwise it will be set when bringing up the netdevice (tun_alloc).
*/
- if (process_private->rxq_fds[0] == -1)
+ if (process_private->fds[0] == -1)
return 0;
+
if (set) {
struct rte_flow *remote_flow;
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
}
for (i = 0; i < n; i++) {
struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+ int fd = process_private->fds[i];
/* Skip queues that cannot request interrupts. */
- if (!rxq || process_private->rxq_fds[i] == -1) {
+ if (!rxq || fd == -1) {
+ /* Use invalid intr_vec[] index to disable entry. */
/* Use invalid intr_vec[] index to disable entry. */
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + count))
return -rte_errno;
- if (rte_intr_efds_index_set(intr_handle, count,
- process_private->rxq_fds[i]))
+ if (rte_intr_efds_index_set(intr_handle, count, fd))
return -rte_errno;
count++;
}
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v13 06/11] net/tap: rewrite the RSS BPF program
2024-05-21 2:47 2% ` [PATCH v13 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21 2:47 2% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 2:47 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 267 +++++++++++++++++++++++++
9 files changed, 397 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a target architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ const __u32 *key;
+ __be16 proto;
+ __u32 mark;
+ __u32 hash;
+ __u16 queue;
+
+ __builtin_preserve_access_index(({
+ mark = skb->mark;
+ proto = skb->protocol;
+ }));
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ key = (const __u32 *)rsskey->key;
+
+ if (proto == bpf_htons(ETH_P_IP))
+ hash = parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (proto == bpf_htons(ETH_P_IPV6))
+ hash = parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ hash = 0;
+
+ if (hash == 0)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+ __builtin_preserve_access_index(({
+ skb->queue_mapping = queue;
+ }));
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v13 02/11] net/tap: do not duplicate fd's
@ 2024-05-21 2:47 2% ` Stephen Hemminger
2024-05-21 2:47 2% ` [PATCH v13 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 2:47 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.
MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).
The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/rel_notes/release_24_07.rst | 4 +
drivers/net/tap/rte_eth_tap.c | 192 ++++++++++---------------
drivers/net/tap/rte_eth_tap.h | 3 +-
drivers/net/tap/tap_flow.c | 3 +-
drivers/net/tap/tap_intr.c | 7 +-
5 files changed, 89 insertions(+), 120 deletions(-)
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..fa9692924b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Update Tap PMD driver.
+
+ * Updated to support up to 8 queues when used by secondary process.
+
Removed Items
-------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
- int rxq_count;
- int txq_count;
+ int q_count;
/*
* The file descriptors are in the dedicated part
* of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint16_t data_off = rte_pktmbuf_headroom(mbuf);
int len;
- len = readv(process_private->rxq_fds[rxq->queue_id],
+ len = readv(process_private->fds[rxq->queue_id],
*rxq->iovecs,
1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
}
/* copy the tx frame data */
- n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+ n = writev(process_private->fds[txq->queue_id], iovecs, k);
if (n <= 0)
return -1;
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
struct rte_mp_msg msg;
struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
int err;
- int fd_iterator = 0;
struct pmd_process_private *process_private = dev->process_private;
int i;
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
- for (i = 0; i < dev->data->nb_tx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->txq_fds[i];
- msg.num_fds++;
- request_param->txq_count++;
- }
- for (i = 0; i < dev->data->nb_rx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->rxq_fds[i];
- msg.num_fds++;
- request_param->rxq_count++;
- }
+
+ /* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+ for (i = 0; i < dev->data->nb_rx_queues; i++)
+ msg.fds[i] = process_private->fds[i];
+
+ request_param->q_count = dev->data->nb_rx_queues;
+ msg.num_fds = dev->data->nb_rx_queues;
err = rte_mp_sendmsg(&msg);
if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
struct rte_eth_dev *dev;
const struct ipc_queues *request_param =
(const struct ipc_queues *)request->param;
- int fd_iterator;
- int queue;
struct pmd_process_private *process_private;
dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
request_param->port_name);
return -1;
}
+
process_private = dev->process_private;
- fd_iterator = 0;
- TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
- request_param->txq_count);
- for (queue = 0; queue < request_param->txq_count; queue++)
- process_private->txq_fds[queue] = request->fds[fd_iterator++];
- for (queue = 0; queue < request_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+ TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+ for (int q = 0; q < request_param->q_count; q++)
+ process_private->fds[q] = request->fds[q];
+
return 0;
}
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
return 0;
}
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+ if (process_private->fds[qid] != -1) {
+ close(process_private->fds[qid]);
+ process_private->fds[qid] = -1;
+ }
+}
+
static int
tap_dev_close(struct rte_eth_dev *dev)
{
int i;
struct pmd_internals *internals = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
- struct rx_queue *rxq;
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
}
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- if (process_private->rxq_fds[i] != -1) {
- rxq = &internals->rxq[i];
- close(process_private->rxq_fds[i]);
- process_private->rxq_fds[i] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
- if (process_private->txq_fds[i] != -1) {
- close(process_private->txq_fds[i]);
- process_private->txq_fds[i] = -1;
- }
+ struct rx_queue *rxq = &internals->rxq[i];
+
+ tap_queue_close(process_private, i);
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
}
if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!rxq)
return;
+
process_private = rte_eth_devices[rxq->in_port].process_private;
- if (process_private->rxq_fds[rxq->queue_id] != -1) {
- close(process_private->rxq_fds[rxq->queue_id]);
- process_private->rxq_fds[rxq->queue_id] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
+
+ if (dev->data->tx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!txq)
return;
- process_private = rte_eth_devices[txq->out_port].process_private;
- if (process_private->txq_fds[txq->queue_id] != -1) {
- close(process_private->txq_fds[txq->queue_id]);
- process_private->txq_fds[txq->queue_id] = -1;
- }
+ process_private = rte_eth_devices[txq->out_port].process_private;
+ if (dev->data->rx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
uint16_t qid,
int is_rx)
{
- int ret;
- int *fd;
- int *other_fd;
- const char *dir;
+ int fd, ret;
struct pmd_internals *pmd = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
struct rx_queue *rx = &internals->rxq[qid];
struct tx_queue *tx = &internals->txq[qid];
- struct rte_gso_ctx *gso_ctx;
+ struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+ const char *dir = is_rx ? "rx" : "tx";
- if (is_rx) {
- fd = &process_private->rxq_fds[qid];
- other_fd = &process_private->txq_fds[qid];
- dir = "rx";
- gso_ctx = NULL;
- } else {
- fd = &process_private->txq_fds[qid];
- other_fd = &process_private->rxq_fds[qid];
- dir = "tx";
- gso_ctx = &tx->gso_ctx;
- }
- if (*fd != -1) {
+ fd = process_private->fds[qid];
+ if (fd != -1) {
/* fd for this queue already exists */
TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
- pmd->name, *fd, dir, qid);
+ pmd->name, fd, dir, qid);
gso_ctx = NULL;
- } else if (*other_fd != -1) {
- /* Only other_fd exists. dup it */
- *fd = dup(*other_fd);
- if (*fd < 0) {
- *fd = -1;
- TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
- return -1;
- }
- TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
- pmd->name, *other_fd, dir, qid, *fd);
} else {
- /* Both RX and TX fds do not exist (equal -1). Create fd */
- *fd = tun_alloc(pmd, 0, 0);
- if (*fd < 0) {
- *fd = -1; /* restore original value */
+ fd = tun_alloc(pmd, 0, 0);
+ if (fd < 0) {
TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
return -1;
}
+
TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
- pmd->name, dir, qid, *fd);
+ pmd->name, dir, qid, fd);
+
+ process_private->fds[qid] = fd;
}
tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
tx->type = pmd->type;
- return *fd;
+ return fd;
}
static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG, " RX TUNTAP device name %s, qid %d on fd %d",
internals->name, rx_queue_id,
- process_private->rxq_fds[rx_queue_id]);
+ process_private->fds[rx_queue_id]);
return 0;
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG,
" TX TUNTAP device name %s, qid %d on fd %d csum %s",
internals->name, tx_queue_id,
- process_private->txq_fds[tx_queue_id],
+ process_private->fds[tx_queue_id],
txq->csum ? "on" : "off");
return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
dev->intr_handle = pmd->intr_handle;
/* Presetup the fds to -1 as being not valid */
- for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- process_private->rxq_fds[i] = -1;
- process_private->txq_fds[i] = -1;
- }
+ for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+ process_private->fds[i] = -1;
+
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
struct ipc_queues *request_param = (struct ipc_queues *)request.param;
struct ipc_queues *reply_param;
struct pmd_process_private *process_private = dev->process_private;
- int queue, fd_iterator;
/* Prepare the request */
memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
/* Attach the queues from received file descriptors */
- if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+ if (reply_param->q_count != reply->num_fds) {
TAP_LOG(ERR, "Unexpected number of fds received");
return -1;
}
- dev->data->nb_rx_queues = reply_param->rxq_count;
- dev->data->nb_tx_queues = reply_param->txq_count;
- fd_iterator = 0;
- for (queue = 0; queue < reply_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
- for (queue = 0; queue < reply_param->txq_count; queue++)
- process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+ dev->data->nb_rx_queues = reply_param->q_count;
+ dev->data->nb_tx_queues = reply_param->q_count;
+
+ for (int q = 0; q < reply_param->q_count; q++)
+ process_private->fds[q] = reply->fds[q];
+
free(reply);
return 0;
}
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
/* Fill file descriptors for all queues */
reply.num_fds = 0;
- reply_param->rxq_count = 0;
- if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
- RTE_MP_MAX_FD_NUM){
- TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+ reply_param->q_count = 0;
+
+ RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+ if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+ TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+ dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
return -1;
}
for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
- reply_param->rxq_count++;
- }
- RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
- reply_param->txq_count = 0;
- for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
- reply_param->txq_count++;
+ reply.fds[reply.num_fds++] = process_private->fds[queue];
+ reply_param->q_count++;
}
- RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
/* Send reply */
strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
};
struct pmd_process_private {
- int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
- int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+ int fds[RTE_PMD_TAP_MAX_QUEUES];
};
/* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
* If netdevice is there, setup appropriate flow rules immediately.
* Otherwise it will be set when bringing up the netdevice (tun_alloc).
*/
- if (process_private->rxq_fds[0] == -1)
+ if (process_private->fds[0] == -1)
return 0;
+
if (set) {
struct rte_flow *remote_flow;
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
}
for (i = 0; i < n; i++) {
struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+ int fd = process_private->fds[i];
/* Skip queues that cannot request interrupts. */
- if (!rxq || process_private->rxq_fds[i] == -1) {
+ if (!rxq || fd == -1) {
+ /* Use invalid intr_vec[] index to disable entry. */
/* Use invalid intr_vec[] index to disable entry. */
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + count))
return -rte_errno;
- if (rte_intr_efds_index_set(intr_handle, count,
- process_private->rxq_fds[i]))
+ if (rte_intr_efds_index_set(intr_handle, count, fd))
return -rte_errno;
count++;
}
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
2024-05-20 22:08 0% ` Ferruh Yigit
2024-05-20 22:25 0% ` Luca Boccassi
@ 2024-05-20 23:20 0% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-20 23:20 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Luca Boccassi, Christian Ehrhardt, Patrick Robb, dpdklab,
Aaron Conole, dev
On Mon, 20 May 2024 23:08:04 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> > It can be done, but it is a _lot_ of work and requires a lot of shims,
> > so for something optional it's not really worth it. Given libbpf 1.0
> > also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> > as it's incompatible, so this pmd will simply be skipped there. I
> > think it's fine. 24.04 has a new one.
> >
>
> Does Ubuntu 24.04 have libbpf >= 1.0 ?
Yes it does.
Tried this on a 24.04 VM, needed to install pkg-config and clang.
But then it builds.
It does have some other fortify warnings (in rte_pcapng.c) but
these are unrelated and exist on main branch as well.
^ permalink raw reply [relevance 0%]
* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
2024-05-20 22:08 0% ` Ferruh Yigit
@ 2024-05-20 22:25 0% ` Luca Boccassi
2024-05-20 23:20 0% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2024-05-20 22:25 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Stephen Hemminger, Christian Ehrhardt, Patrick Robb, dpdklab,
Aaron Conole, dev
On Mon, 20 May 2024 at 23:08, Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 5/20/2024 10:42 PM, Luca Boccassi wrote:
> > On Mon, 20 May 2024 at 19:43, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> >>
> >> On Mon, 20 May 2024 18:49:19 +0100
> >> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >>> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> >>>> There were multiple issues in the RSS queue support in the TAP
> >>>> driver. This required extensive rework of the BPF support.
> >>>>
> >>>> Change the BPF loading to use bpftool to
> >>>> create a skeleton header file, and load with libbpf.
> >>>> The BPF is always compiled from source so less chance that
> >>>> source and instructions diverge. Also resolves issue where
> >>>> libbpf and source get out of sync. The program
> >>>> is only loaded once, so if multiple rules are created
> >>>> only one BPF program is loaded in kernel.
> >>>>
> >>>> The new BPF program only needs a single action.
> >>>> No need for action and re-classification step.
> >>>>
> >>>> It also fixes the missing bits from the original.
> >>>> - supports setting RSS key per flow
> >>>> - level of hash can be L3 or L3/L4.
> >>>>
> >>>> Bugzilla ID: 1329
> >>>>
> >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>
> >>>
> >>>
> >>> The libbpf version in my Ubuntu box, installed with package manager, is
> >>> 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
> >>> is not compiled for me.
> >>>
> >>>
> >>> @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
> >>> there a reason Ubuntu stick to this version? And can we expect an update
> >>> soon?
> >>>
> >>>
> >>> @Patric, I assume test environment also doesn't have 'libbpf', version:
> >>> '>= 1.0' which we need to test this feature.
> >>> Is it possible to update test environment to justify this dependency?
> >>>
> >>> I think we need to verify at least build (with and without dependency
> >>> met) for the set.
> >>
> >> The BPF API changed a lot, and it is not really possible to support
> >> both.
> >
> > It can be done, but it is a _lot_ of work and requires a lot of shims,
> > so for something optional it's not really worth it. Given libbpf 1.0
> > also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> > as it's incompatible, so this pmd will simply be skipped there. I
> > think it's fine. 24.04 has a new one.
> >
>
> Does Ubuntu 24.04 have libbpf >= 1.0 ?
Yes:
https://packages.ubuntu.com/search?keywords=libbpf-dev&searchon=names&suite=all§ion=all
^ permalink raw reply [relevance 0%]
* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
2024-05-20 21:42 3% ` Luca Boccassi
@ 2024-05-20 22:08 0% ` Ferruh Yigit
2024-05-20 22:25 0% ` Luca Boccassi
2024-05-20 23:20 0% ` Stephen Hemminger
0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2024-05-20 22:08 UTC (permalink / raw)
To: Luca Boccassi, Stephen Hemminger
Cc: Christian Ehrhardt, Patrick Robb, dpdklab, Aaron Conole, dev
On 5/20/2024 10:42 PM, Luca Boccassi wrote:
> On Mon, 20 May 2024 at 19:43, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>>
>> On Mon, 20 May 2024 18:49:19 +0100
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
>>>> There were multiple issues in the RSS queue support in the TAP
>>>> driver. This required extensive rework of the BPF support.
>>>>
>>>> Change the BPF loading to use bpftool to
>>>> create a skeleton header file, and load with libbpf.
>>>> The BPF is always compiled from source so less chance that
>>>> source and instructions diverge. Also resolves issue where
>>>> libbpf and source get out of sync. The program
>>>> is only loaded once, so if multiple rules are created
>>>> only one BPF program is loaded in kernel.
>>>>
>>>> The new BPF program only needs a single action.
>>>> No need for action and re-classification step.
>>>>
>>>> It also fixes the missing bits from the original.
>>>> - supports setting RSS key per flow
>>>> - level of hash can be L3 or L3/L4.
>>>>
>>>> Bugzilla ID: 1329
>>>>
>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>
>>>
>>>
>>> The libbpf version in my Ubuntu box, installed with package manager, is
>>> 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
>>> is not compiled for me.
>>>
>>>
>>> @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
>>> there a reason Ubuntu stick to this version? And can we expect an update
>>> soon?
>>>
>>>
>>> @Patric, I assume test environment also doesn't have 'libbpf', version:
>>> '>= 1.0' which we need to test this feature.
>>> Is it possible to update test environment to justify this dependency?
>>>
>>> I think we need to verify at least build (with and without dependency
>>> met) for the set.
>>
>> The BPF API changed a lot, and it is not really possible to support
>> both.
>
> It can be done, but it is a _lot_ of work and requires a lot of shims,
> so for something optional it's not really worth it. Given libbpf 1.0
> also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> as it's incompatible, so this pmd will simply be skipped there. I
> think it's fine. 24.04 has a new one.
>
Does Ubuntu 24.04 have libbpf >= 1.0 ?
^ permalink raw reply [relevance 0%]
* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
@ 2024-05-20 21:42 3% ` Luca Boccassi
2024-05-20 22:08 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2024-05-20 21:42 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Ferruh Yigit, Christian Ehrhardt, Patrick Robb, dpdklab,
Aaron Conole, dev
On Mon, 20 May 2024 at 19:43, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 20 May 2024 18:49:19 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> > On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> > > There were multiple issues in the RSS queue support in the TAP
> > > driver. This required extensive rework of the BPF support.
> > >
> > > Change the BPF loading to use bpftool to
> > > create a skeleton header file, and load with libbpf.
> > > The BPF is always compiled from source so less chance that
> > > source and instructions diverge. Also resolves issue where
> > > libbpf and source get out of sync. The program
> > > is only loaded once, so if multiple rules are created
> > > only one BPF program is loaded in kernel.
> > >
> > > The new BPF program only needs a single action.
> > > No need for action and re-classification step.
> > >
> > > It also fixes the missing bits from the original.
> > > - supports setting RSS key per flow
> > > - level of hash can be L3 or L3/L4.
> > >
> > > Bugzilla ID: 1329
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > >
> >
> >
> > The libbpf version in my Ubuntu box, installed with package manager, is
> > 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
> > is not compiled for me.
> >
> >
> > @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
> > there a reason Ubuntu stick to this version? And can we expect an update
> > soon?
> >
> >
> > @Patric, I assume test environment also doesn't have 'libbpf', version:
> > '>= 1.0' which we need to test this feature.
> > Is it possible to update test environment to justify this dependency?
> >
> > I think we need to verify at least build (with and without dependency
> > met) for the set.
>
> The BPF API changed a lot, and it is not really possible to support
> both.
It can be done, but it is a _lot_ of work and requires a lot of shims,
so for something optional it's not really worth it. Given libbpf 1.0
also broke ABI, Ubuntu 22.04 and older cannot really get a new version
as it's incompatible, so this pmd will simply be skipped there. I
think it's fine. 24.04 has a new one.
^ permalink raw reply [relevance 3%]
* Re: [PATCH v12 02/12] net/tap: do not duplicate fd's
@ 2024-05-20 18:16 3% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-20 18:16 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Mon, 20 May 2024 18:46:30 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> > The TAP device can use same file descriptor for both rx and tx queues.
> > This allows up to 8 queues (versus 4) to be used with secondary process.
> >
>
> It would be nice to briefly update where this limit comes from, as
> removing this limitation can be longer term solution for this issue.
Sure, the limit comes from a too low value in RTE_MP_MAX_FD_NUM (8)
It should have been set to the max Linux supports which is SCM_MAX_FD
(253).
But fixing it their breaks the ABI, and needs to wait for 24.11.
Impacts AF_XDP as well.
By not duplicating we can make TAP work with 8 queues and still
get MP support. Good idea anyway not to waste fd's.
>
> > Bugzilla ID: 1381
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
>
> Can you please move the relevant release notes update to this patch?
> So we can distribute the release notes update to patches instead of
> dedicated update for it.
>
> Except from above change,
> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
>
^ permalink raw reply [relevance 3%]
* [PATCH 1/9] doc: reword design section in contributors guidelines
@ 2024-05-13 15:59 6% ` Nandini Persad
1 sibling, 0 replies; 200+ results
From: Nandini Persad @ 2024-05-13 15:59 UTC (permalink / raw)
To: dev; +Cc: Thomas Monjalon
minor editing for grammar and syntax of design section
Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
---
.mailmap | 1 +
doc/guides/contributing/design.rst | 79 ++++++++++++++----------------
doc/guides/linux_gsg/sys_reqs.rst | 2 +-
3 files changed, 38 insertions(+), 44 deletions(-)
diff --git a/.mailmap b/.mailmap
index 66ebc20666..7d4929c5d1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1002,6 +1002,7 @@ Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
Nalla Pradeep <pnalla@marvell.com>
Na Na <nana.nn@alibaba-inc.com>
Nan Chen <whutchennan@gmail.com>
+Nandini Persad <nandinipersad361@gmail.com>
Nannan Lu <nannan.lu@intel.com>
Nan Zhou <zhounan14@huawei.com>
Narcisa Vasile <navasile@linux.microsoft.com> <navasile@microsoft.com> <narcisa.vasile@microsoft.com>
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index b724177ba1..921578aec5 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -8,22 +8,26 @@ Design
Environment or Architecture-specific Sources
--------------------------------------------
-In DPDK and DPDK applications, some code is specific to an architecture (i686, x86_64) or to an executive environment (freebsd or linux) and so on.
-As far as is possible, all such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+In DPDK and DPDK applications, some code is architecture-specific (i686, x86_64) or environment-specific (FreeBsd or Linux, etc.).
+When feasible, such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
-By convention, a file is common if it is not located in a directory indicating that it is specific.
-For instance, a file located in a subdir of "x86_64" directory is specific to this architecture.
+By convention, a file is specific if the directory is indicated. Otherwise, it is common.
+
+For example:
+
+A file located in a subdir of "x86_64" directory is specific to this architecture.
A file located in a subdir of "linux" is specific to this execution environment.
.. note::
Code in DPDK libraries and applications should be generic.
- The correct location for architecture or executive environment specific code is in the EAL.
+ The correct location for architecture or executive environment-specific code is in the EAL.
+
+When necessary, there are several ways to handle specific code:
-When absolutely necessary, there are several ways to handle specific code:
-* Use a ``#ifdef`` with a build definition macro in the C code.
- This can be done when the differences are small and they can be embedded in the same C file:
+* When the differences are small and they can be embedded in the same C file, use a ``#ifdef`` with a build definition macro in the C code.
+
.. code-block:: c
@@ -33,9 +37,9 @@ When absolutely necessary, there are several ways to handle specific code:
titi();
#endif
-* Use build definition macros and conditions in the Meson build file. This is done when the differences are more significant.
- In this case, the code is split into two separate files that are architecture or environment specific.
- This should only apply inside the EAL library.
+* When the differences are more significant, use build definition macros and conditions in the Meson build file.
+In this case, the code is split into two separate files that are architecture or environment specific.
+This should only apply inside the EAL library.
Per Architecture Sources
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +47,7 @@ Per Architecture Sources
The following macro options can be used:
* ``RTE_ARCH`` is a string that contains the name of the architecture.
-* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined only if we are building for those architectures.
+* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined when building for these architectures.
Per Execution Environment Sources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -51,30 +55,21 @@ Per Execution Environment Sources
The following macro options can be used:
* ``RTE_EXEC_ENV`` is a string that contains the name of the executive environment.
-* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only if we are building for this execution environment.
+* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only when building for this execution environment.
Mbuf features
-------------
-The ``rte_mbuf`` structure must be kept small (128 bytes).
-
-In order to add new features without wasting buffer space for unused features,
-some fields and flags can be registered dynamically in a shared area.
-The "dynamic" mbuf area is the default choice for the new features.
-
-The "dynamic" area is eating the remaining space in mbuf,
-and some existing "static" fields may need to become "dynamic".
+A designated area in mbuf stores "dynamically" registered fields and flags. It is the default choice for accomodating new features. The "dynamic" area consumes the remaining space in the mbuf, indicating that it's being efficiently utilized. However, the ``rte_mbuf`` structure must be kept small (128 bytes).
-Adding a new static field or flag must be an exception matching many criteria
-like (non exhaustive): wide usage, performance, size.
+As more features are added, the space for existinG=g "static" fields (fields that are allocated statically) may need to be reconsidered and possibly converted to "dynamic" allocation. Adding a new static field or flag should be an exception. It must meet specific criteria including widespread usage, performance impact, and size considerations. Before adding a new static feature, it must be justified by its necessity and its impact on the system's efficiency.
Runtime Information - Logging, Tracing and Telemetry
----------------------------------------------------
-It is often desirable to provide information to the end-user
-as to what is happening to the application at runtime.
-DPDK provides a number of built-in mechanisms to provide this introspection:
+The end user may inquire as to what is happening to the application at runtime.
+DPDK provides several built-in mechanisms to provide these insights:
* :ref:`Logging <dynamic_logging>`
* :doc:`Tracing <../prog_guide/trace_lib>`
@@ -82,11 +77,11 @@ DPDK provides a number of built-in mechanisms to provide this introspection:
Each of these has its own strengths and suitabilities for use within DPDK components.
-Below are some guidelines for when each should be used:
+Here are guidelines for when each mechanism should be used:
* For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
- Depending on the severity of the issue, the appropriate log level, for example,
- ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+ For example, depending on the severity of the issue, the appropriate log level,
+ ``ERROR``, ``WARNING`` or ``NOTICE`` should be used.
.. note::
@@ -96,22 +91,21 @@ Below are some guidelines for when each should be used:
* For component initialization, or other cases where a path through the code
is only likely to be taken once,
- either *logging* at ``DEBUG`` level or *tracing* may be used, or potentially both.
+ either *logging* at ``DEBUG`` level or *tracing* may be used, or both.
In the latter case, tracing can provide basic information as to the code path taken,
with debug-level logging providing additional details on internal state,
- not possible to emit via tracing.
+ which is not possible to emit via tracing.
* For a component's data-path, where a path is to be taken multiple times within a short timeframe,
*tracing* should be used.
Since DPDK tracing uses `Common Trace Format <https://diamon.org/ctf/>`_ for its tracing logs,
post-analysis can be done using a range of external tools.
-* For numerical or statistical data generated by a component, for example, per-packet statistics,
+* For numerical or statistical data generated by a component, such as per-packet statistics,
*telemetry* should be used.
-* For any data where the data may need to be gathered at any point in the execution
- to help assess the state of the application component,
- for example, core configuration, device information, *telemetry* should be used.
+* For any data that may need to be gathered at any point during the execution
+ to help assess the state of the application component (for example, core configuration, device information) *telemetry* should be used.
Telemetry callbacks should not modify any program state, but be "read-only".
Many libraries also include a ``rte_<libname>_dump()`` function as part of their API,
@@ -135,13 +129,12 @@ requirements for preventing ABI changes when implementing statistics.
Mechanism to allow the application to turn library statistics on and off
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Having runtime support for enabling/disabling library statistics is recommended,
-as build-time options should be avoided. However, if build-time options are used,
-for example as in the table library, the options can be set using c_args.
-When this flag is set, all the counters supported by current library are
+Having runtime support for enabling/disabling library statistics is recommended
+as build-time options should be avoided. However, if build-time options are used, as in the table library, the options can be set using c_args.
+When this flag is set, all the counters supported by the current library are
collected for all the instances of every object type provided by the library.
When this flag is cleared, none of the counters supported by the current library
-are collected for any instance of any object type provided by the library:
+are collected for any instance of any object type provided by the library.
Prevention of ABI changes due to library statistics support
@@ -165,8 +158,8 @@ Motivation to allow the application to turn library statistics on and off
It is highly recommended that each library provides statistics counters to allow
an application to monitor the library-level run-time events. Typical counters
-are: number of packets received/dropped/transmitted, number of buffers
-allocated/freed, number of occurrences for specific events, etc.
+are: the number of packets received/dropped/transmitted, the number of buffers
+allocated/freed, the number of occurrences for specific events, etc.
However, the resources consumed for library-level statistics counter collection
have to be spent out of the application budget and the counters collected by
@@ -229,5 +222,5 @@ Developers should work with the Linux Kernel community to get the required
functionality upstream. PF functionality should only be added to DPDK for
testing and prototyping purposes while the kernel work is ongoing. It should
also be marked with an "EXPERIMENTAL" tag. If the functionality isn't
-upstreamable then a case can be made to maintain the PF functionality in DPDK
+upstreamable, then a case can be made to maintain the PF functionality in DPDK
without the EXPERIMENTAL tag.
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 13be715933..0569c5cae6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -99,7 +99,7 @@ e.g. :doc:`../nics/index`
Running DPDK Applications
-------------------------
-To run a DPDK application, some customization may be required on the target machine.
+To run a DPDK application, customization may be required on the target machine.
System Software
~~~~~~~~~~~~~~~
--
2.34.1
^ permalink raw reply [relevance 6%]
* Re: [RFC v2] net/af_packet: make stats reset reliable
2024-05-07 16:00 3% ` Morten Brørup
2024-05-07 16:54 0% ` Ferruh Yigit
@ 2024-05-08 7:48 0% ` Mattias Rönnblom
1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-05-08 7:48 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger, Ferruh Yigit
Cc: John W. Linville, Thomas Monjalon, dev, Mattias Rönnblom
On 2024-05-07 18:00, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
>
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
>
> I agree.
>
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
>
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
>
>>
>> The most complete version of SW stats now is in the virtio driver.
>
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
>
> Considering a DPDK application running as a virtual machine (guest) on a host server...
>
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
>
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
>
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
>
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.
Relaxed atomic loads from and stores to naturally aligned addresses are
for free on ARM and x86_64 up to at least 64 bits.
"For free" is not entirely true, since both C11 relaxed stores and
stores through volatile may prevent vectorization in GCC. I don't see
why, but in practice that seems to be the case. That is very much a
corner case.
Also, as mentioned before, C11 atomic store effectively has volatile
semantics, which in turn may prevent some compiler optimizations.
On 32-bit x86, 64-bit atomic stores use xmm registers, but those are
going to be used anyway, since you'll have a 64-bit add.
> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
>
_Atomic is not slower than atomics without _Atomic, when you actually
need atomic operations.
^ permalink raw reply [relevance 0%]
* Re: [RFC v2] net/af_packet: make stats reset reliable
2024-05-07 16:54 0% ` Ferruh Yigit
@ 2024-05-07 18:47 0% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-07 18:47 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Morten Brørup, Mattias Rönnblom, John W. Linville,
Thomas Monjalon, dev, Mattias Rönnblom
On Tue, 7 May 2024 17:54:18 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 5/7/2024 5:00 PM, Morten Brørup wrote:
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Tuesday, 7 May 2024 16.51
> >
> >> I would prefer that the SW statistics be handled generically by ethdev
> >> layers and used by all such drivers.
> >
> > I agree.
> >
> > Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> >
> > Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> >
>
> I am against ehtdev layer being aware of SW drivers and behave
> differently for them.
> This is dev_ops and can be managed per driver. We can add helper
> functions for drivers if there is a common pattern.
It is more about having a set of helper routines for SW only drivers.
I have something in progress for this.
^ permalink raw reply [relevance 0%]
* Re: [RFC v2] net/af_packet: make stats reset reliable
2024-05-07 16:00 3% ` Morten Brørup
@ 2024-05-07 16:54 0% ` Ferruh Yigit
2024-05-07 18:47 0% ` Stephen Hemminger
2024-05-08 7:48 0% ` Mattias Rönnblom
1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-05-07 16:54 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger
Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
Mattias Rönnblom
On 5/7/2024 5:00 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
>
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
>
> I agree.
>
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
>
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
>
I am against ehtdev layer being aware of SW drivers and behave
differently for them.
This is dev_ops and can be managed per driver. We can add helper
functions for drivers if there is a common pattern.
>>
>> The most complete version of SW stats now is in the virtio driver.
>
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
>
> Considering a DPDK application running as a virtual machine (guest) on a host server...
>
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
>
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
>
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
>
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.
> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
>
^ permalink raw reply [relevance 0%]
* RE: [RFC v2] net/af_packet: make stats reset reliable
@ 2024-05-07 16:00 3% ` Morten Brørup
2024-05-07 16:54 0% ` Ferruh Yigit
2024-05-08 7:48 0% ` Mattias Rönnblom
0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2024-05-07 16:00 UTC (permalink / raw)
To: Stephen Hemminger, Ferruh Yigit
Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
Mattias Rönnblom
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 7 May 2024 16.51
> I would prefer that the SW statistics be handled generically by ethdev
> layers and used by all such drivers.
I agree.
Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
>
> The most complete version of SW stats now is in the virtio driver.
It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
Considering a DPDK application running as a virtual machine (guest) on a host server...
If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
> If reset needs to be reliable (debatable), then it needs to be done without
> atomics.
Let's modify that slightly: Without performance degradation in the fast path.
I'm not sure that all atomic operations are slow.
But you are right that it needs to be done without _Atomic counters; they seem to be slow.
^ permalink raw reply [relevance 3%]
* Re: [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD
@ 2024-05-03 13:24 3% ` Bruce Richardson
0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-05-03 13:24 UTC (permalink / raw)
To: Tom Jones; +Cc: dev
On Fri, May 03, 2024 at 02:12:58PM +0100, Tom Jones wrote:
> Hi Bruce,
>
> thanks for letting me know
>
> I'm not tied to anything particularly. This change isn't compatible with the previous API, but I'm not against making it so if that is really the best thing to do. As is, the dpdk changes and the contigmem changes need to come together because the API changes for getting the physical addresses.
>
I don't think it's a major problem if the new kernel code doesn't work with the
older DPDK userspace code, we can apply both together in one patch.
However, it would count as an API/ABI change so would need to be deferred
for merge to 24.11 release, I think.
> It is just the sysctl paths that differ. I'm not sure what the compatibility needs to be for DPDK, for all of my usage I have built the kernel module with the package - making API changes easy.
>
> I'm happy to follow which ever path you think is best.
>
I'll maybe give more thoughts on this once I try the patch out. Hopefully
I'll get to test it out this afternoon. Don't bother trying to rework
anything until then! :-)
> Sorry for the patch confusion, I'll try to keep the sequence obvious going forward.
>
No problem. Thanks for the contribution here. FreeBSD support has sadly
been lacking a number of features for some time now, so all changes to
close the feature gap vs linux are very welcome!
/Bruce
^ permalink raw reply [relevance 3%]
* [PATCH v12 06/12] net/tap: rewrite the RSS BPF program
@ 2024-05-02 21:31 2% ` Stephen Hemminger
2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02 21:31 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 267 +++++++++++++++++++++++++
9 files changed, 397 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a target architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ const __u32 *key;
+ __be16 proto;
+ __u32 mark;
+ __u32 hash;
+ __u16 queue;
+
+ __builtin_preserve_access_index(({
+ mark = skb->mark;
+ proto = skb->protocol;
+ }));
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ key = (const __u32 *)rsskey->key;
+
+ if (proto == bpf_htons(ETH_P_IP))
+ hash = parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (proto == bpf_htons(ETH_P_IPV6))
+ hash = parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ hash = 0;
+
+ if (hash == 0)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+ __builtin_preserve_access_index(({
+ skb->queue_mapping = queue;
+ }));
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: [PATCH] net/af_packet: fix statistics
2024-05-01 18:18 0% ` Morten Brørup
@ 2024-05-02 13:47 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-05-02 13:47 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger
Cc: dev, John W. Linville, Mattias Rönnblom
On 5/1/2024 7:18 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Wednesday, 1 May 2024 18.45
>>
>> On Wed, 1 May 2024 17:25:59 +0100
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>>> - Remove the tx_error counter since it was not correct.
>>>> When transmit ring is full it is not an error and
>>>> the driver correctly returns only the number sent.
>>>>
>>>
>>> nack
>>> Transmit full is not only return case here.
>>> There are actual errors continue to process relying this error
>> calculation.
>>> Also there are error cases like interface down.
>>> Those error cases should be handled individually if we remove this.
>>> I suggest split this change to separate patch.
>>
>> I see multiple drivers have copy/pasted same code and consider
>> transmit full as an error. It is not.
>
> +1
> Transmit full is certainly not an error!
>
I am not referring to the transmit full case, there are error cases in
the driver:
- oversized packets
- vlan inserting failure
In above cases Tx loop continues, which relies at the end of the loop
these packets will be counted as error. We can't just remove error
counter, need to handle above.
- poll on fd fails
- poll on fd returns POLLERR (if down)
In above cases driver Tx loop breaks and all remaining packets counted
as error.
- sendto() fails
All packets sent to af_packet frame counted as error.
As you can see there are real error cases which are handled in the driver.
That is why instead of just removing error counter, I suggest handle it
more properly in a separate patch.
>>
>> There should be a new statistic at ethdev layer that does record
>> transmit full, and make it across all drivers, but that would have
>> to wait for ABI change.
>
> What happens to these non-transmittable packets depend on the application.
> Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
> Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.
>
> In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.
>
^ permalink raw reply [relevance 0%]
* [PATCH v11 5/9] net/tap: rewrite the RSS BPF program
@ 2024-05-02 2:49 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02 2:49 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 267 +++++++++++++++++++++++++
9 files changed, 397 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a target architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ const __u32 *key;
+ __be16 proto;
+ __u32 mark;
+ __u32 hash;
+ __u16 queue;
+
+ __builtin_preserve_access_index(({
+ mark = skb->mark;
+ proto = skb->protocol;
+ }));
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ key = (const __u32 *)rsskey->key;
+
+ if (proto == bpf_htons(ETH_P_IP))
+ hash = parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (proto == bpf_htons(ETH_P_IPV6))
+ hash = parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ hash = 0;
+
+ if (hash == 0)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+ __builtin_preserve_access_index(({
+ skb->queue_mapping = queue;
+ }));
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* RE: [PATCH] net/af_packet: fix statistics
2024-05-01 16:44 3% ` Stephen Hemminger
@ 2024-05-01 18:18 0% ` Morten Brørup
2024-05-02 13:47 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-05-01 18:18 UTC (permalink / raw)
To: Stephen Hemminger, Ferruh Yigit
Cc: dev, John W. Linville, Mattias Rönnblom
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 1 May 2024 18.45
>
> On Wed, 1 May 2024 17:25:59 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> > > - Remove the tx_error counter since it was not correct.
> > > When transmit ring is full it is not an error and
> > > the driver correctly returns only the number sent.
> > >
> >
> > nack
> > Transmit full is not only return case here.
> > There are actual errors continue to process relying this error
> calculation.
> > Also there are error cases like interface down.
> > Those error cases should be handled individually if we remove this.
> > I suggest split this change to separate patch.
>
> I see multiple drivers have copy/pasted same code and consider
> transmit full as an error. It is not.
+1
Transmit full is certainly not an error!
>
> There should be a new statistic at ethdev layer that does record
> transmit full, and make it across all drivers, but that would have
> to wait for ABI change.
What happens to these non-transmittable packets depend on the application.
Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.
In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.
^ permalink raw reply [relevance 0%]
* Re: [PATCH] net/af_packet: fix statistics
@ 2024-05-01 16:44 3% ` Stephen Hemminger
2024-05-01 18:18 0% ` Morten Brørup
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:44 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, John W. Linville, Mattias Rönnblom
On Wed, 1 May 2024 17:25:59 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> > - Remove the tx_error counter since it was not correct.
> > When transmit ring is full it is not an error and
> > the driver correctly returns only the number sent.
> >
>
> nack
> Transmit full is not only return case here.
> There are actual errors continue to process relying this error calculation.
> Also there are error cases like interface down.
> Those error cases should be handled individually if we remove this.
> I suggest split this change to separate patch.
I see multiple drivers have copy/pasted same code and consider
transmit full as an error. It is not.
There should be a new statistic at ethdev layer that does record
transmit full, and make it across all drivers, but that would have
to wait for ABI change.
^ permalink raw reply [relevance 3%]
* [PATCH v10 5/9] net/tap: rewrite the RSS BPF program
@ 2024-05-01 16:12 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:12 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite of the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF code.
- the resulting queue is placed in skb by using skb mark
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 49 +++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 394 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..181f76a134
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across mulitple queues if required by a flow action. The program is
+loaded into the krnel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+ as a targe architecture and bpftool to convert object to headers.
+
+ Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Usws clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+ from tap_rss.bpf.o. This header contains wrapper functions for
+ managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+ the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v9 5/9] net/tap: rewrite the RSS BPF program
@ 2024-04-26 15:48 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-26 15:48 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF.
- the resulting queue is placed in skb rather
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 38 ++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 383 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+ Some older versions of Ubuntu do not have working bpftool package.
+ Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+ This skeleton header is an large byte array which contains the
+ BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: Minutes of DPDK Technical Board Meeting, 2024-04-03
2024-04-24 17:25 3% ` Morten Brørup
@ 2024-04-24 19:10 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-04-24 19:10 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev, techboard
24/04/2024 19:25, Morten Brørup:
> > Inlining should be avoided in public headers because of ABI
> > compatibility issue
> > and structures being exported because of inline requirement.
>
> This sounds like a techboard decision, which I don't think it was.
> Suggested wording:
>
> A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.
>
>
> Perhaps I'm being paranoid, and the phrase "should be" already suffices.
>
> Whichever wording you prefer,
> ACK
This is the final report sent to dev@dpdk.org :)
Yes I think the word "should" reflect what was said
during the meeting without any formal vote.
^ permalink raw reply [relevance 0%]
* Re: getting rid of type argument to rte_malloc().
2024-04-24 10:29 0% ` Ferruh Yigit
2024-04-24 16:23 0% ` Stephen Hemminger
@ 2024-04-24 19:06 0% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:06 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through. It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed. Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> >
> > Maybe doing ABI version now?
> >
>
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
>
Something like this script, and fix up the result.
From 13ec14dff523f6e896ab55a17a3c66b45bd90bbc Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 24 Apr 2024 09:39:27 -0700
Subject: [PATCH] devtools/cocci: add script to find unnecessary malloc type
The malloc type argument is unused and should be NULL.
This script finds and fixes those places.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/cocci/malloc-type.cocci | 33 ++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
create mode 100644 devtools/cocci/malloc-type.cocci
diff --git a/devtools/cocci/malloc-type.cocci b/devtools/cocci/malloc-type.cocci
new file mode 100644
index 0000000000..cd74797ecb
--- /dev/null
+++ b/devtools/cocci/malloc-type.cocci
@@ -0,0 +1,33 @@
+//
+// The Ting type field in malloc routines was never
+// implemented and should be NULL
+//
+@@
+expression T != NULL;
+expression num, socket, size, align;
+@@
+(
+- rte_malloc(T, size, align)
++ rte_malloc(NULL, size, align)
+|
+- rte_zmalloc(T, size, align)
++ rte_zmalloc(NULL, size, align)
+|
+- rte_calloc(T, num, size, align)
++ rte_calloc(NULL, num, size, align)
+|
+- rte_realloc(T, size, align)
++ rte_realloc(NULL, size, align)
+|
+- rte_realloc_socket(T, size, align, socket)
++ rte_realloc_socket(NULL, size, align, socket)
+|
+- rte_malloc_socket(T, size, align, socket)
++ rte_malloc_socket(NULL, size, align, socket)
+|
+- rte_zmalloc_socket(T, size, align, socket)
++ rte_zmalloc_socket(NULL, size, align, socket)
+|
+- rte_calloc_socket(T, num, size, align, socket)
++ rte_calloc_socket(NULL, num, size, align, socket)
+)
--
2.43.0
^ permalink raw reply [relevance 0%]
* Re: getting rid of type argument to rte_malloc().
2024-04-24 17:09 0% ` Morten Brørup
@ 2024-04-24 19:05 0% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:05 UTC (permalink / raw)
To: Morten Brørup; +Cc: Ferruh Yigit, dev
On Wed, 24 Apr 2024 19:09:24 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 24 April 2024 18.24
> >
> > On Wed, 24 Apr 2024 11:29:51 +0100
> > Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> > > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > > > For the 24.11 release, I want to remove the unused type string
> > argument
> > > > that shows up in rte_malloc() and related functions, then percolates
> > down
> > > > through. It was a idea in the 1.0 release of DPDK, never
> > implemented and
> > > > never removed. Yes it will cause API breakage, a large sweeping
> > change;
> > > > probably easily scripted with coccinelle.
> > > >
> > > > Maybe doing ABI version now?
> > > >
> > >
> > > Won't this impact many applications, is there big enough motivation to
> > > force many DPDK applications to update their code, living with it
> > looks
> > > simpler.
> > >
> >
> > Yeah, probably too big an impact but at least:
> > - change the documentation to say "do not use" should be NULL
> > - add script to remove all usage inside of DPDK
> > - get rid of places where useless arg is passed around inside
> > of the allocator internals.
>
> For the sake of discussion:
> Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.
The name is used by memzone lookup for secondary process etc.
>
> Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?
>
Not really.
^ permalink raw reply [relevance 0%]
* RE: Minutes of DPDK Technical Board Meeting, 2024-04-03
2024-04-24 15:24 3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
@ 2024-04-24 17:25 3% ` Morten Brørup
2024-04-24 19:10 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:25 UTC (permalink / raw)
To: Thomas Monjalon, dev; +Cc: techboard
> Inlining should be avoided in public headers because of ABI
> compatibility issue
> and structures being exported because of inline requirement.
This sounds like a techboard decision, which I don't think it was.
Suggested wording:
A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.
Perhaps I'm being paranoid, and the phrase "should be" already suffices.
Whichever wording you prefer,
ACK
^ permalink raw reply [relevance 3%]
* RE: getting rid of type argument to rte_malloc().
2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 16:23 0% ` Stephen Hemminger
@ 2024-04-24 17:09 0% ` Morten Brørup
2024-04-24 19:05 0% ` Stephen Hemminger
1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:09 UTC (permalink / raw)
To: Stephen Hemminger, Ferruh Yigit; +Cc: dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 24 April 2024 18.24
>
> On Wed, 24 Apr 2024 11:29:51 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > > For the 24.11 release, I want to remove the unused type string
> argument
> > > that shows up in rte_malloc() and related functions, then percolates
> down
> > > through. It was a idea in the 1.0 release of DPDK, never
> implemented and
> > > never removed. Yes it will cause API breakage, a large sweeping
> change;
> > > probably easily scripted with coccinelle.
> > >
> > > Maybe doing ABI version now?
> > >
> >
> > Won't this impact many applications, is there big enough motivation to
> > force many DPDK applications to update their code, living with it
> looks
> > simpler.
> >
>
> Yeah, probably too big an impact but at least:
> - change the documentation to say "do not use" should be NULL
> - add script to remove all usage inside of DPDK
> - get rid of places where useless arg is passed around inside
> of the allocator internals.
For the sake of discussion:
Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.
Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?
^ permalink raw reply [relevance 0%]
* Re: getting rid of type argument to rte_malloc().
2024-04-24 16:23 0% ` Stephen Hemminger
@ 2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 17:09 0% ` Morten Brørup
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through. It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed. Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> >
> > Maybe doing ABI version now?
> >
>
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
>
Yeah, probably too big an impact but at least:
- change the documentation to say "do not use" should be NULL
- add script to remove all usage inside of DPDK
- get rid of places where useless arg is passed around inside
of the allocator internals.
^ permalink raw reply [relevance 0%]
* Re: getting rid of type argument to rte_malloc().
2024-04-24 10:29 0% ` Ferruh Yigit
@ 2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 17:09 0% ` Morten Brørup
2024-04-24 19:06 0% ` Stephen Hemminger
1 sibling, 2 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through. It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed. Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> >
> > Maybe doing ABI version now?
> >
>
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
>
Yeah, probably too big an impact but at least:
- change the documentation to say "do not use" should be NULL
- add script to remove all usage inside of DPDK
- get rid of places where useless arg is passed around inside
of the allocator internals.
^ permalink raw reply [relevance 0%]
* Minutes of DPDK Technical Board Meeting, 2024-04-03
@ 2024-04-24 15:24 3% Thomas Monjalon
2024-04-24 17:25 3% ` Morten Brørup
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-04-24 15:24 UTC (permalink / raw)
To: dev; +Cc: techboard
Members Attending: 10/11
- Aaron Conole
- Bruce Richardson
- Hemant Agrawal
- Honnappa Nagarahalli
- Kevin Traynor
- Konstantin Ananyev
- Maxime Coquelin
- Morten Brørup
- Stephen Hemminger
- Thomas Monjalon (Chair)
NOTE: The Technical Board meetings take place every second Wednesday at 3 pm UTC
on https://zoom-lfx.platform.linuxfoundation.org/meeting/96459488340?password=d808f1f6-0a28-4165-929e-5a5bcae7efeb
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes
1/ MSVC
Work to be able to compile DPDK with MSVC is progressing.
Regarding the tooling, UNH CI is testing MSVC in Windows Server 2022 job.
There was an ask for GHA job building with MSVC.
Example:
https://github.com/danielzsh/spark/blob/master/.github/workflows/compile.yml
We should not break MSVC compilation for enabled libraries.
When creating a new library, we should require to allow MSVC where it makes sense.
Some guidelines could be added in doc/guides/contributing/design.rst
2/ function inlining
There are pros and cons for function inlining.
There should not be inlining in control path functions.
Inlining should be avoided in public headers because of ABI compatibility issue
and structures being exported because of inline requirement.
Inlining should be used with care, with benchmarks as a proof of efficiency.
Having too much inlining will have a drawback on instruction cache,
that's why we should justify any new usage of inline.
Note that the same recommendations apply with the use of prefetch and likely/unlikely.
^ permalink raw reply [relevance 3%]
* Re: getting rid of type argument to rte_malloc().
2024-04-24 4:08 3% getting rid of type argument to rte_malloc() Stephen Hemminger
@ 2024-04-24 10:29 0% ` Ferruh Yigit
2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 19:06 0% ` Stephen Hemminger
0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2024-04-24 10:29 UTC (permalink / raw)
To: Stephen Hemminger, dev
On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> For the 24.11 release, I want to remove the unused type string argument
> that shows up in rte_malloc() and related functions, then percolates down
> through. It was a idea in the 1.0 release of DPDK, never implemented and
> never removed. Yes it will cause API breakage, a large sweeping change;
> probably easily scripted with coccinelle.
>
> Maybe doing ABI version now?
>
Won't this impact many applications, is there big enough motivation to
force many DPDK applications to update their code, living with it looks
simpler.
^ permalink raw reply [relevance 0%]
* getting rid of type argument to rte_malloc().
@ 2024-04-24 4:08 3% Stephen Hemminger
2024-04-24 10:29 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-24 4:08 UTC (permalink / raw)
To: dev
For the 24.11 release, I want to remove the unused type string argument
that shows up in rte_malloc() and related functions, then percolates down
through. It was a idea in the 1.0 release of DPDK, never implemented and
never removed. Yes it will cause API breakage, a large sweeping change;
probably easily scripted with coccinelle.
Maybe doing ABI version now?
^ permalink raw reply [relevance 3%]
* Re: [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port
2024-03-08 10:38 0% ` lihuisong (C)
@ 2024-04-23 11:17 0% ` lihuisong (C)
1 sibling, 0 replies; 200+ results
From: lihuisong (C) @ 2024-04-23 11:17 UTC (permalink / raw)
To: dev, thomas, ferruh.yigit
Cc: andrew.rybchenko, fengchengwen, liudongdong3, liuyonglong
Hi Ferruh and Thomas,
It's been almost two years since this issue was reported.
We have discussed a lot before, and also made some progress and consensus.
Can you take a look at it again? Looking forward to your reply.
BR/
Huisong
在 2024/1/30 14:36, Huisong Li 写道:
> This patchset fix some bugs and support attaching and detaching port
> in primary and secondary.
>
> ---
> -v7: fix conflicts
> -v6: adjust rte_eth_dev_is_used position based on alphabetical order
> in version.map
> -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
> -v4: fix a misspelling.
> -v3:
> #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
> for other bus type.
> #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
> the probelm in patch 2/5.
> -v2: resend due to CI unexplained failure.
>
> Huisong Li (5):
> drivers/bus: restore driver assignment at front of probing
> ethdev: fix skip valid port in probing callback
> app/testpmd: check the validity of the port
> app/testpmd: add attach and detach port for multiple process
> app/testpmd: stop forwarding in new or destroy event
>
> app/test-pmd/testpmd.c | 47 +++++++++++++++---------
> app/test-pmd/testpmd.h | 1 -
> drivers/bus/auxiliary/auxiliary_common.c | 9 ++++-
> drivers/bus/dpaa/dpaa_bus.c | 9 ++++-
> drivers/bus/fslmc/fslmc_bus.c | 8 +++-
> drivers/bus/ifpga/ifpga_bus.c | 12 ++++--
> drivers/bus/pci/pci_common.c | 9 ++++-
> drivers/bus/vdev/vdev.c | 10 ++++-
> drivers/bus/vmbus/vmbus_common.c | 9 ++++-
> drivers/net/bnxt/bnxt_ethdev.c | 3 +-
> drivers/net/bonding/bonding_testpmd.c | 1 -
> drivers/net/mlx5/mlx5.c | 2 +-
> lib/ethdev/ethdev_driver.c | 13 +++++--
> lib/ethdev/ethdev_driver.h | 12 ++++++
> lib/ethdev/ethdev_pci.h | 2 +-
> lib/ethdev/rte_class_eth.c | 2 +-
> lib/ethdev/rte_ethdev.c | 4 +-
> lib/ethdev/rte_ethdev.h | 4 +-
> lib/ethdev/version.map | 1 +
> 19 files changed, 114 insertions(+), 44 deletions(-)
>
^ permalink raw reply [relevance 0%]
* Community CI Meeting Minutes - April 18, 2024
@ 2024-04-18 17:49 3% Patrick Robb
0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-18 17:49 UTC (permalink / raw)
To: ci; +Cc: dev, dts
April 18, 2024
#####################################################################
Attendees
1. Patrick Robb
2. Paul Szczepanek
3. Juraj Linkeš
4. Aaron Conole
5. Ali Alnubani
#####################################################################
Minutes
=====================================================================
General Announcements
* GB is wrapping up voting on the UNH Lab server refresh proposal -
should have more info on this by end of week
* Patrick Robbshare the list of current servers and servers to be
acquired with Paul
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
* UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
* Reminder - Final conclusion on policy is:
* A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
* B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
* In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).
=====================================================================
CI Status
---------------------------------------------------------------------
UNH-IOL Community Lab
* ABI binaries got sent to Dodji Seketeli after some abi fails came
in, and he confirmed moving to libabigail 2.4 resolves the issue. Cody
Chengis working on this now.
* To be submitted to upstream template engine:
https://git.dpdk.org/tools/dpdk-ci/tree/containers/template_engine
* SPDK: Working on these compile jobs
* Currently compile with:
* Ubuntu 22.04
* Debian 11
* Debian 12
* CentOS 8
* CentOS 9
* Fedora 37
* Fedora 38
* Fedora 39
* Opensuse-Leap 15 but with a warning
* Cannot compile with:
* Rhel 8
* Rhel 9
* SPDK docs state rhel is “best effort”
* Questions:
* Should we run with werror enabled?
* What versions of SPDK do we test?
* What versions of DPDK do we test SPDK against?
* Unit tests pass with the distros which are compiling
* UPDATE: We are polling SPDK people on their Slack, but current
plan is to bring testing online for only distros which work with
werror compile currently. So, not RHEL, no Opensuse.
* OvS DPDK testing:
* OvS compile still passing on some distros but failing on others -
Adam is going to circle back on this when he gets time
* Submit tickets for any outstanding issues
* Bring ovs compile testing online
* Plans for performance testing are still pending Aaron & David discussing
* Code coverage for fast tests is now running in CI, 1x per month. You
can download the latest reports here:
https://dpdkdashboard.iol.unh.edu/results/dashboard/code-coverage
* Open out/coveragereport/index.html
* Do we need code coverage reports for the other unit tests suites?
(not just fast test)
* UNH to dry run this, share results
* NVIDIA: Gal has offered to send two CX7 NICs to the UNH lab. This
should allow us to install two CX7 NICs on the DUT, and start
forwarding between the two NICs.
* Pcapng_autotest
* UNH has some spurious failures reported to patchwork for Debian
12. Need to reconnect with Stephen to debug this further.
* Updating Coverity binaries at UNH
---------------------------------------------------------------------
Intel Lab
* Patrick pinged John M again about a lab contact
---------------------------------------------------------------------
Github Actions
* No new updates
---------------------------------------------------------------------
Loongarch Lab
* None
=====================================================================
DTS Improvements & Test Development
* API docs generation:
* Reviews are needed for this. Need an ACK from
bruce.richardson@intel.com now that there are some new changes on the
meson side
* Thomas wants to link DTS api docs from doxygen from dpdk docs
* UNH folks should provide a review
* Jeremy is switching back to DTS next week, and will be working more
on the 2nd scatter case for MLNX, which will rely on the capabilities
querying (and testcase skipping) patch. Will provide feedback to Juraj
on that patch soon.
* Hugepages patch is updated based on feedback from Morten, but
essentially the same (in approach) as last week.
=====================================================================
Any other business
* DPDK Summit in Montreal will now be late September. This plan is
still being finalized.
* Next Meeting: May 1, 2024
^ permalink raw reply [relevance 3%]
* [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
2024-04-11 8:22 3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
@ 2024-04-12 11:57 3% ` Akhil Goyal
2024-05-29 10:43 0% ` Anoob Joseph
0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2024-04-12 11:57 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, hemant.agrawal, anoobj,
pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
ciara.power, Akhil Goyal
Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.
An implementation in cnxk crypto driver is also added along with
a test case in test app.
The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.
v2: fixed shared and clang build issues.
Akhil Goyal (3):
cryptodev: add API to get used queue pair depth
crypto/cnxk: support queue pair depth API
test/crypto: add QP depth used count case
app/test/test_cryptodev.c | 117 +++++++++++++++++++++++
devtools/libabigail.abignore | 3 +
drivers/crypto/cnxk/cn10k_cryptodev.c | 1 +
drivers/crypto/cnxk/cn9k_cryptodev.c | 2 +
drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 16 ++++
drivers/crypto/cnxk/cnxk_cryptodev_ops.h | 2 +
lib/cryptodev/cryptodev_pmd.c | 1 +
lib/cryptodev/cryptodev_pmd.h | 2 +
lib/cryptodev/cryptodev_trace_points.c | 3 +
lib/cryptodev/rte_cryptodev.h | 45 +++++++++
lib/cryptodev/rte_cryptodev_core.h | 7 +-
lib/cryptodev/rte_cryptodev_trace_fp.h | 7 ++
lib/cryptodev/version.map | 3 +
13 files changed, 208 insertions(+), 1 deletion(-)
--
2.25.1
^ permalink raw reply [relevance 3%]
* [PATCH v4 3/3] dts: add API doc generation
@ 2024-04-12 10:14 2% ` Juraj Linkeš
0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-04-12 10:14 UTC (permalink / raw)
To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
paul.szczepanek, Luca.Vizzarro, npratte
Cc: dev, Juraj Linkeš
The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.
Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.
There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
code.
* Also the same Python packages as DTS, for the same reason.
[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
buildtools/call-sphinx-build.py | 33 +++++++++++++++++++---------
doc/api/doxy-api-index.md | 3 +++
doc/api/doxy-api.conf.in | 2 ++
doc/api/meson.build | 11 +++++++---
doc/guides/conf.py | 39 ++++++++++++++++++++++++++++-----
doc/guides/meson.build | 1 +
doc/guides/tools/dts.rst | 34 +++++++++++++++++++++++++++-
dts/doc/meson.build | 27 +++++++++++++++++++++++
dts/meson.build | 16 ++++++++++++++
meson.build | 1 +
10 files changed, 148 insertions(+), 19 deletions(-)
create mode 100644 dts/doc/meson.build
create mode 100644 dts/meson.build
diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 39a60d09fa..aea771a64e 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,37 +3,50 @@
# Copyright(c) 2019 Intel Corporation
#
+import argparse
import sys
import os
from os.path import join
from subprocess import run, PIPE, STDOUT
from packaging.version import Version
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
# set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+ os.environ['DTS_ROOT'] = args.dts_root
# for sphinx version >= 1.7 add parallelism using "-j auto"
-ver = run([sphinx, '--version'], stdout=PIPE,
+ver = run([args.sphinx, '--version'], stdout=PIPE,
stderr=STDOUT).stdout.decode().split()[-1]
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
if Version(ver) >= Version('1.7'):
sphinx_cmd += ['-j', 'auto']
# find all the files sphinx will process so we can write them as dependencies
srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
srcfiles.extend([join(root, f) for f in files])
+if not os.path.exists(args.dst):
+ os.makedirs(args.dst)
+
# run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
- process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
- stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+ process = run(
+ sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+ stdout=out
+ )
# create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
d.write('html: ' + ' '.join(srcfiles) + '\n')
sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 8c1eb8fafa..d5f823b7f0 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -243,3 +243,6 @@ The public API headers are grouped by topics:
[experimental APIs](@ref rte_compat.h),
[ABI versioning](@ref rte_function_versioning.h),
[version](@ref rte_version.h)
+
+- **tests**:
+ [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 27afec8b3b..2e08c6a452 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -123,6 +123,8 @@ SEARCHENGINE = YES
SORT_MEMBER_DOCS = NO
SOURCE_BROWSER = YES
+ALIASES = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
EXAMPLE_PATH = @TOPDIR@/examples
EXAMPLE_PATTERNS = *.c
EXAMPLE_RECURSIVE = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
+doc_api_build_dir = meson.current_build_dir()
doxygen = find_program('doxygen', required: get_option('enable_docs'))
if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
# set up common Doxygen configuration
cdata = configuration_data()
cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
cdata.set('WARN_AS_ERROR', 'NO')
if get_option('werror')
cdata.set('WARN_AS_ERROR', 'YES')
endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
# configure HTML Doxygen run
html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
from sphinx import __version__ as sphinx_version
from os import listdir
from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
import configparser
@@ -24,6 +23,37 @@
file=stderr)
pass
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+ 'members': True,
+ 'member-order': 'bysource',
+ 'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+ path.append(dts_root)
+ # DTS Sidebar config
+ html_theme_options = {
+ 'collapse_navigation': False,
+ 'navigation_depth': -1,
+ }
+
stop_on_error = ('-W' in argv)
project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
html_show_copyright = False
highlight_language = 'none'
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
master_doc = 'index'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2018 Intel Corporation
+doc_guides_source_dir = meson.current_source_dir()
sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 47b218b2c6..d1c3c2af7a 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -280,7 +280,12 @@ and try not to divert much from it.
The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
when some of the basics are not met.
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
The style must conform to the `Google style
<https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
See an example of the style `here
@@ -415,6 +420,33 @@ the DTS code check and format script.
Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+ poetry install --no-root --with docs
+ poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+ ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+ Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+ the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
Configuration Schema
--------------------
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+ subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+ extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+ output: 'html',
+ command: [sphinx_wrapper, sphinx, meson.project_version(),
+ meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+ build_by_default: false,
+ install: get_option('enable_docs'),
+ install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+ message = 'No docs targets found'
+else
+ message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+ depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
# build docs
subdir('doc')
+subdir('dts')
# build any examples explicitly requested - useful for developers - and
# install any example code into the appropriate install path
--
2.34.1
^ permalink raw reply [relevance 2%]
* [PATCH 0/3] cryptodev: add API to get used queue pair depth
@ 2024-04-11 8:22 3% Akhil Goyal
2024-04-12 11:57 3% ` [PATCH v2 " Akhil Goyal
0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2024-04-11 8:22 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, hemant.agrawal, anoobj,
pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
ciara.power, Akhil Goyal
Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.
An implementation in cnxk crypto driver is also added along with
a test case in test app.
The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.
Akhil Goyal (3):
cryptodev: add API to get used queue pair depth
crypto/cnxk: support queue pair depth API
test/crypto: add QP depth used count case
app/test/test_cryptodev.c | 117 +++++++++++++++++++++++
devtools/libabigail.abignore | 3 +
drivers/crypto/cnxk/cn10k_cryptodev.c | 1 +
drivers/crypto/cnxk/cn9k_cryptodev.c | 2 +
drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 15 +++
drivers/crypto/cnxk/cnxk_cryptodev_ops.h | 2 +
lib/cryptodev/cryptodev_pmd.c | 1 +
lib/cryptodev/cryptodev_pmd.h | 2 +
lib/cryptodev/cryptodev_trace_points.c | 3 +
lib/cryptodev/rte_cryptodev.h | 45 +++++++++
lib/cryptodev/rte_cryptodev_core.h | 7 +-
lib/cryptodev/rte_cryptodev_trace_fp.h | 7 ++
12 files changed, 204 insertions(+), 1 deletion(-)
--
2.25.1
^ permalink raw reply [relevance 3%]
* Re: Strict aliasing problem with rte_eth_linkstatus_set()
2024-04-10 19:58 3% ` Tyler Retzlaff
@ 2024-04-11 3:20 0% ` fengchengwen
0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2024-04-11 3:20 UTC (permalink / raw)
To: Tyler Retzlaff, Morten Brørup
Cc: Stephen Hemminger, Ferruh Yigit, dev, Dengdui Huang
Hi All,
On 2024/4/11 3:58, Tyler Retzlaff wrote:
> On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>> Sent: Wednesday, 10 April 2024 17.27
>>>
>>> On Wed, 10 Apr 2024 17:33:53 +0800
>>> fengchengwen <fengchengwen@huawei.com> wrote:
>>>
>>>> Last: We think there are two ways to solve this problem.
>>>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
>>> project.
>>>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
>>> see above).
>>>> PS: We prefer first way.
>>>>
>>>
>>> Please send a patch to replace alias with union.
>>
>> +1
>>
>> Fixing this specific bug would be good.
OK for this,
and I already send a bugfix which use union.
Thanks
>>
>> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
>> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.
>
> agree, union is the correct way to get defined behavior. there are
> valuable optimizatons that the compiler can make with strict aliasing
> enabled so -Wstrict-aliasing is a good suggestion as opposed to
> disabling it.
>
> also the union won't break the abi if introduced correctly.
> .
>
^ permalink raw reply [relevance 0%]
* Re: Strict aliasing problem with rte_eth_linkstatus_set()
@ 2024-04-10 19:58 3% ` Tyler Retzlaff
2024-04-11 3:20 0% ` fengchengwen
0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-10 19:58 UTC (permalink / raw)
To: Morten Brørup
Cc: Stephen Hemminger, fengchengwen, Ferruh Yigit, dev, Dengdui Huang
On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 10 April 2024 17.27
> >
> > On Wed, 10 Apr 2024 17:33:53 +0800
> > fengchengwen <fengchengwen@huawei.com> wrote:
> >
> > > Last: We think there are two ways to solve this problem.
> > > 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
> > project.
> > > 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
> > see above).
> > > PS: We prefer first way.
> > >
> >
> > Please send a patch to replace alias with union.
>
> +1
>
> Fixing this specific bug would be good.
>
> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.
agree, union is the correct way to get defined behavior. there are
valuable optimizatons that the compiler can make with strict aliasing
enabled so -Wstrict-aliasing is a good suggestion as opposed to
disabling it.
also the union won't break the abi if introduced correctly.
^ permalink raw reply [relevance 3%]
* Re: Strict aliasing problem with rte_eth_linkstatus_set()
@ 2024-04-10 15:58 3% ` Ferruh Yigit
1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-10 15:58 UTC (permalink / raw)
To: Stephen Hemminger, fengchengwen; +Cc: dev, Dengdui Huang
On 4/10/2024 4:27 PM, Stephen Hemminger wrote:
> On Wed, 10 Apr 2024 17:33:53 +0800
> fengchengwen <fengchengwen@huawei.com> wrote:
>
>> Last: We think there are two ways to solve this problem.
>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK project.
>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please see above).
>> PS: We prefer first way.
>>
>
> Please send a patch to replace alias with union.
>
+1
I am not sure about ABI implications, as size is not changing I expect
it won't be an issue but may be good to verify with libabigail.
> PS: you can also override aliasing for a few lines of code with either pragma's
> or lots of casting. Both are messy and hard to maintain.
^ permalink raw reply [relevance 3%]
* [PATCH v8 5/8] net/tap: rewrite the RSS BPF program
@ 2024-04-09 3:40 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-09 3:40 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF.
- the resulting queue is placed in skb rather
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 38 ++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 383 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+ Some older versions of Ubuntu do not have working bpftool package.
+ Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+ This skeleton header is an large byte array which contains the
+ BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v7 5/8] net/tap: rewrite the RSS BPF program
@ 2024-04-08 21:18 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-08 21:18 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF.
- the resulting queue is placed in skb rather
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 38 ++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 383 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+ Some older versions of Ubuntu do not have working bpftool package.
+ Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+ This skeleton header is an large byte array which contains the
+ BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* [PATCH v6 6/8] net/tap: rewrite the RSS BPF program
@ 2024-04-05 21:14 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05 21:14 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF.
- the resulting queue is placed in skb rather
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 38 ++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 383 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+ Some older versions of Ubuntu do not have working bpftool package.
+ Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+ This skeleton header is an large byte array which contains the
+ BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* RE: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
2024-04-05 15:15 3% ` Stephen Hemminger
@ 2024-04-05 18:17 3% ` Chautru, Nicolas
0 siblings, 0 replies; 200+ results
From: Chautru, Nicolas @ 2024-04-05 18:17 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, maxime.coquelin, hemant.agrawal, Marchand, David, Vargas, Hernan
Hi Stephen,
It is not strictly ABI compatible since the size of the structure increases, hence only updating for 24.11.
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, April 5, 2024 8:15 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com; hemant.agrawal@nxp.com;
> Marchand, David <david.marchand@redhat.com>; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
>
> On Thu, 4 Apr 2024 14:04:45 -0700
> Nicolas Chautru <nicolas.chautru@intel.com> wrote:
>
> > Capturing additional queue stats counter for the depth of enqueue
> > batch still available on the given queue. This can help application to
> > monitor that depth at run time.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> > lib/bbdev/rte_bbdev.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index
> > 0cbfdd1c95..25514c58ac 100644
> > --- a/lib/bbdev/rte_bbdev.h
> > +++ b/lib/bbdev/rte_bbdev.h
> > @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
> > * bbdev operation
> > */
> > uint64_t acc_offload_cycles;
> > + /** Available number of enqueue batch on that queue. */
> > + uint16_t enqueue_depth_avail;
> > };
> >
> > /**
>
> Doesn't this break the ABI?
^ permalink raw reply [relevance 3%]
* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
2024-04-05 0:46 3% ` Stephen Hemminger
@ 2024-04-05 15:15 3% ` Stephen Hemminger
2024-04-05 18:17 3% ` Chautru, Nicolas
1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-05 15:15 UTC (permalink / raw)
To: Nicolas Chautru
Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas
On Thu, 4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:
> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
>
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
> lib/bbdev/rte_bbdev.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
> index 0cbfdd1c95..25514c58ac 100644
> --- a/lib/bbdev/rte_bbdev.h
> +++ b/lib/bbdev/rte_bbdev.h
> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
> * bbdev operation
> */
> uint64_t acc_offload_cycles;
> + /** Available number of enqueue batch on that queue. */
> + uint16_t enqueue_depth_avail;
> };
>
> /**
Doesn't this break the ABI?
^ permalink raw reply [relevance 3%]
* Re: [PATCH] lib: add get/set link settings interface
2024-04-05 0:55 0% ` Tyler Retzlaff
2024-04-05 0:56 0% ` Tyler Retzlaff
@ 2024-04-05 8:58 0% ` David Marchand
1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-04-05 8:58 UTC (permalink / raw)
To: Tyler Retzlaff, Dodji Seketeli; +Cc: Thomas Monjalon, dev
On Fri, Apr 5, 2024 at 2:55 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > > this breaks the abi. David does libabigail pick this up i wonder?
> >
> > Yes, the CI flagged it.
> >
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
>
> i'm jealous we don't have libabigail on windows, so helpful.
libabigail is written in C++ and relies on the elfutils and libxml2 libraries.
I am unclear about what binary format is used in Windows... so I am
not sure how much work would be required to have it on Windows.
That's more something to discuss with Dodji :-).
--
David Marchand
^ permalink raw reply [relevance 0%]
* Re: [PATCH] lib: add get/set link settings interface
2024-04-05 0:55 0% ` Tyler Retzlaff
@ 2024-04-05 0:56 0% ` Tyler Retzlaff
2024-04-05 8:58 0% ` David Marchand
1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05 0:56 UTC (permalink / raw)
To: David Marchand
Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev
On Thu, Apr 04, 2024 at 05:55:18PM -0700, Tyler Retzlaff wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > Hello Tyler, Marek,
> >
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > > > There are link settings parameters available from PMD drivers level
> > > > which are currently not exposed to the user via consistent interface.
> > > > When interface is available for system level those information can
> > > > be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > > > ETHTOOL_GLINKSETTINGS). There are use cases where
> > > > physical interface is passthrough to dpdk driver and is not available
> > > > from system level. Information provided by ioctl carries information
> > > > useful for link auto negotiation settings among others.
> > > >
> > > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > > ---
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index 147257d6a2..66aad925d0 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > > > __extension__
> > > > struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > > > uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */
> > > > - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > > + uint16_t link_duplex : 2; /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > > > uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > > > uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
> > > > };
> > >
> > > this breaks the abi. David does libabigail pick this up i wonder?
> > >
> >
> > Yes, the CI flagged it.
> >
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
>
> i'm jealous we don't have libabigail on windows, so helpfull.
s/ll/l/ end of day bah.
>
> >
> > 1 function with some indirect sub-type change:
> >
> > [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> > rte_ethdev.c:2972:1 has some indirect sub-type changes:
> > parameter 2 of type 'rte_eth_link*' has sub-type changes:
> > in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> > type size hasn't changed
> > 2 data member changes:
> > 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> > 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
> >
> > Error: ABI issue reported for abidiff --suppr
> > /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> > at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> > reference/usr/local/include --headers-dir2
> > build_install/usr/local/include
> > reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> > build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> > ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> > this as a potential issue).
> >
> >
> > GHA would have caught it too, but the documentation generation failed
> > before reaching the ABI check.
> > http://mails.dpdk.org/archives/test-report/2024-April/631086.html
> >
> >
> > --
> > David Marchand
^ permalink raw reply [relevance 0%]
* Re: [PATCH] lib: add get/set link settings interface
2024-04-04 7:09 4% ` David Marchand
@ 2024-04-05 0:55 0% ` Tyler Retzlaff
2024-04-05 0:56 0% ` Tyler Retzlaff
2024-04-05 8:58 0% ` David Marchand
0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05 0:55 UTC (permalink / raw)
To: David Marchand
Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev
On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> Hello Tyler, Marek,
>
> On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > > There are link settings parameters available from PMD drivers level
> > > which are currently not exposed to the user via consistent interface.
> > > When interface is available for system level those information can
> > > be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > > ETHTOOL_GLINKSETTINGS). There are use cases where
> > > physical interface is passthrough to dpdk driver and is not available
> > > from system level. Information provided by ioctl carries information
> > > useful for link auto negotiation settings among others.
> > >
> > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > ---
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > index 147257d6a2..66aad925d0 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > > __extension__
> > > struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > > uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */
> > > - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > + uint16_t link_duplex : 2; /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > > uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > > uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
> > > };
> >
> > this breaks the abi. David does libabigail pick this up i wonder?
> >
>
> Yes, the CI flagged it.
>
> Looking at the UNH report (in patchwork):
> http://mails.dpdk.org/archives/test-report/2024-April/631222.html
i'm jealous we don't have libabigail on windows, so helpfull.
>
> 1 function with some indirect sub-type change:
>
> [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> rte_ethdev.c:2972:1 has some indirect sub-type changes:
> parameter 2 of type 'rte_eth_link*' has sub-type changes:
> in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> type size hasn't changed
> 2 data member changes:
> 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
>
> Error: ABI issue reported for abidiff --suppr
> /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> reference/usr/local/include --headers-dir2
> build_install/usr/local/include
> reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
>
>
> GHA would have caught it too, but the documentation generation failed
> before reaching the ABI check.
> http://mails.dpdk.org/archives/test-report/2024-April/631086.html
>
>
> --
> David Marchand
^ permalink raw reply [relevance 0%]
* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
@ 2024-04-05 0:46 3% ` Stephen Hemminger
2024-04-05 15:15 3% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05 0:46 UTC (permalink / raw)
To: Nicolas Chautru
Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas
On Thu, 4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:
> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
>
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Adding field is an ABI change and will have to wait until 24.11 release
^ permalink raw reply [relevance 3%]
* [PATCH v11 2/4] mbuf: remove rte marker fields
2024-04-04 17:51 3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51 2% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.
Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 3 +
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 202 +++++++++++++++++----------------
4 files changed, 116 insertions(+), 99 deletions(-)
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
[suppress_type]
name = rte_eth_fp_ops
has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+ name = rte_mbuf
+ type_kind = struct
+ has_size_change = no
+ has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
Also, make sure to start the actual text at the margin.
=======================================================
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+ have been removed from ``struct rte_mbuf``.
+
API Changes
-----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
static inline void
rte_mbuf_prefetch_part1(struct rte_mbuf *m)
{
- rte_prefetch0(&m->cacheline0);
+ rte_prefetch0(m);
}
/**
@@ -126,7 +126,7 @@
rte_mbuf_prefetch_part2(struct rte_mbuf *m)
{
#if RTE_CACHE_LINE_SIZE == 64
- rte_prefetch0(&m->cacheline1);
+ rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
#else
RTE_SET_USED(m);
#endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..726c2cf 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
* The generic rte_mbuf, containing a packet mbuf.
*/
struct __rte_cache_aligned rte_mbuf {
- RTE_MARKER cacheline0;
-
void *buf_addr; /**< Virtual address of segment buffer. */
#if RTE_IOVA_IN_MBUF
/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
#endif
/* next 8 bytes are initialised on RX descriptor rearm */
- RTE_MARKER64 rearm_data;
- uint16_t data_off;
-
- /**
- * Reference counter. Its size should at least equal to the size
- * of port field (16 bits), to support zero-copy broadcast.
- * It should only be accessed using the following functions:
- * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
- * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
- * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
- */
- RTE_ATOMIC(uint16_t) refcnt;
+ union {
+ uint64_t rearm_data[1];
+ __extension__
+ struct {
+ uint16_t data_off;
+
+ /**
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
+ * It should only be accessed using the following functions:
+ * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+ * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+ * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+ */
+ RTE_ATOMIC(uint16_t) refcnt;
- /**
- * Number of segments. Only valid for the first segment of an mbuf
- * chain.
- */
- uint16_t nb_segs;
+ /**
+ * Number of segments. Only valid for the first segment of an mbuf
+ * chain.
+ */
+ uint16_t nb_segs;
- /** Input port (16 bits to support more than 256 virtual ports).
- * The event eth Tx adapter uses this field to specify the output port.
- */
- uint16_t port;
+ /** Input port (16 bits to support more than 256 virtual ports).
+ * The event eth Tx adapter uses this field to specify the output port.
+ */
+ uint16_t port;
+ };
+ };
uint64_t ol_flags; /**< Offload features. */
- /* remaining bytes are set on RX when pulling packet from descriptor */
- RTE_MARKER rx_descriptor_fields1;
-
- /*
- * The packet type, which is the combination of outer/inner L2, L3, L4
- * and tunnel types. The packet_type is about data really present in the
- * mbuf. Example: if vlan stripping is enabled, a received vlan packet
- * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
- * vlan is stripped from the data.
- */
+ /* remaining 24 bytes are set on RX when pulling packet from descriptor */
union {
- uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+ /* void * type of the array elements is retained for driver compatibility. */
+ void *rx_descriptor_fields1[24 / sizeof(void *)];
__extension__
struct {
- uint8_t l2_type:4; /**< (Outer) L2 type. */
- uint8_t l3_type:4; /**< (Outer) L3 type. */
- uint8_t l4_type:4; /**< (Outer) L4 type. */
- uint8_t tun_type:4; /**< Tunnel type. */
+ /*
+ * The packet type, which is the combination of outer/inner L2, L3, L4
+ * and tunnel types. The packet_type is about data really present in the
+ * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+ * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+ * vlan is stripped from the data.
+ */
union {
- uint8_t inner_esp_next_proto;
- /**< ESP next protocol type, valid if
- * RTE_PTYPE_TUNNEL_ESP tunnel type is set
- * on both Tx and Rx.
- */
+ uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
__extension__
struct {
- uint8_t inner_l2_type:4;
- /**< Inner L2 type. */
- uint8_t inner_l3_type:4;
- /**< Inner L3 type. */
+ uint8_t l2_type:4; /**< (Outer) L2 type. */
+ uint8_t l3_type:4; /**< (Outer) L3 type. */
+ uint8_t l4_type:4; /**< (Outer) L4 type. */
+ uint8_t tun_type:4; /**< Tunnel type. */
+ union {
+ /** ESP next protocol type, valid if
+ * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+ * on both Tx and Rx.
+ */
+ uint8_t inner_esp_next_proto;
+ __extension__
+ struct {
+ /** Inner L2 type. */
+ uint8_t inner_l2_type:4;
+ /** Inner L3 type. */
+ uint8_t inner_l3_type:4;
+ };
+ };
+ uint8_t inner_l4_type:4; /**< Inner L4 type. */
};
};
- uint8_t inner_l4_type:4; /**< Inner L4 type. */
- };
- };
- uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
- uint16_t vlan_tci;
+ uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- union {
- union {
- uint32_t rss; /**< RSS hash result if RSS enabled */
- struct {
+ uint16_t data_len; /**< Amount of data in segment buffer. */
+ /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+ uint16_t vlan_tci;
+
+ union {
union {
+ uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
- uint16_t hash;
- uint16_t id;
- };
- uint32_t lo;
- /**< Second 4 flexible bytes */
- };
- uint32_t hi;
- /**< First 4 flexible bytes or FD ID, dependent
- * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
- */
- } fdir; /**< Filter identifier if FDIR enabled */
- struct rte_mbuf_sched sched;
- /**< Hierarchical scheduler : 8 bytes */
- struct {
- uint32_t reserved1;
- uint16_t reserved2;
- uint16_t txq;
- /**< The event eth Tx adapter uses this field
- * to store Tx queue id.
- * @see rte_event_eth_tx_adapter_txq_set()
- */
- } txadapter; /**< Eventdev ethdev Tx adapter */
- uint32_t usr;
- /**< User defined tags. See rte_distributor_process() */
- } hash; /**< hash information */
- };
+ union {
+ struct {
+ uint16_t hash;
+ uint16_t id;
+ };
+ /** Second 4 flexible bytes */
+ uint32_t lo;
+ };
+ /** First 4 flexible bytes or FD ID, dependent
+ * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+ */
+ uint32_t hi;
+ } fdir; /**< Filter identifier if FDIR enabled */
+ /** Hierarchical scheduler : 8 bytes */
+ struct rte_mbuf_sched sched;
+ struct {
+ uint32_t reserved1;
+ uint16_t reserved2;
+ /** The event eth Tx adapter uses this field
+ * to store Tx queue id.
+ * @see rte_event_eth_tx_adapter_txq_set()
+ */
+ uint16_t txq;
+ } txadapter; /**< Eventdev ethdev Tx adapter */
+ /** User defined tags. See rte_distributor_process() */
+ uint32_t usr;
+ } hash; /**< hash information */
+ };
- /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
- uint16_t vlan_tci_outer;
+ /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+ uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
+ uint16_t buf_len; /**< Length of segment buffer. */
+ };
+ };
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
/* second cache line - fields only used in slow path or on TX */
- alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
#if RTE_IOVA_IN_MBUF
/**
* Next segment of scattered packet. Must be NULL in the last
* segment or in case of non-segmented packet.
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
struct rte_mbuf *next;
#else
/**
* Reserved for dynamic fields
* when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
uint64_t dynfield2;
#endif
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
uint64_t tx_offload; /**< combined for easy fetch */
__extension__
struct {
- uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
- /**< L2 (MAC) Header Length for non-tunneling pkt.
+ /** L2 (MAC) Header Length for non-tunneling pkt.
* Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
*/
+ uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
+ /** L3 (IP) Header Length. */
uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
- /**< L3 (IP) Header Length. */
+ /** L4 (TCP/UDP) Header Length. */
uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
- /**< L4 (TCP/UDP) Header Length. */
+ /** TCP TSO segment size */
uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
- /**< TCP TSO segment size */
-
/*
* Fields for Tx offloading of tunnels.
* These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
* Applications are expected to set appropriate tunnel
* offload flags when they fill in these fields.
*/
+ /** Outer L3 (IP) Hdr Length. */
uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
- /**< Outer L3 (IP) Hdr Length. */
+ /** Outer L2 (MAC) Hdr Length. */
uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
- /**< Outer L2 (MAC) Hdr Length. */
/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
};
--
1.8.3.1
^ permalink raw reply [relevance 2%]
* [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries
` (2 preceding siblings ...)
2024-04-03 17:53 3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51 3% ` Tyler Retzlaff
2024-04-04 17:51 2% ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-06-19 15:01 3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.
For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
Note: diff is easier viewed with -b due to additional nesting from
unions / structs that have been introduced.
v11:
* correct doxygen comment style for field documentation.
v10:
* move removal notices in in release notes from 24.03 to 24.07.
v9:
* provide narrowest possible libabigail.abignore to suppress
removal of fields that were agreed are not actual abi changes.
v8:
* rx_descriptor_fields1 array is now constexpr sized to
24 / sizeof(void *) so that the array encompasses fields
accessed via the array.
* add a comment to rx_descriptor_fields1 array site noting
that void * type of elements is retained for compatibility
with existing drivers.
* clean up comments of fields in rte_mbuf to be before the
field they apply to instead of after.
* duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
conditional compile for first field of cacheline 1 instead of
once before conditional compile block.
v7:
* complete re-write of series, previous versions not noted. all
reviewed-by and acked-by tags (if any) were removed.
Tyler Retzlaff (4):
net/i40e: use inline prefetch function
mbuf: remove rte marker fields
security: remove rte marker fields
cryptodev: remove rte marker fields
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 9 ++
drivers/net/i40e/i40e_rxtx_vec_avx512.c | 2 +-
lib/cryptodev/cryptodev_pmd.h | 5 +-
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 202 +++++++++++++++++---------------
lib/security/rte_security_driver.h | 5 +-
7 files changed, 129 insertions(+), 104 deletions(-)
--
1.8.3.1
^ permalink raw reply [relevance 3%]
* Community CI Meeting Minutes - April 4, 2024
@ 2024-04-04 16:29 3% Patrick Robb
0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-04 16:29 UTC (permalink / raw)
To: ci; +Cc: dev, dts
April 4, 2024
#####################################################################
Attendees
1. Patrick Robb
2. Juraj Linkeš
3. Paul Szczepanek
4. Luca Vizzarro
#####################################################################
Minutes
=====================================================================
General Announcements
* DPDK 24.03 has been released
* UNH Community Lab is experiencing power outages, and we are shutting
down testing for the day after this meeting.
* Will put in retests once we’re back up and running
* Daylight saving time has hit North America, and will also happen in
Europe between this meeting and the next one. Should we adjust?
* We will adjust earlier 1 hour
* Server Refresh:
* GB will vote on this soon (I think over email)
* Patrick sent Nathan some new information about the ARM Grace
server that ARM is requesting, which Nathan is passing along to GB
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
* UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
* Reminder - Final conclusion on policy is:
* A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
* B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
* In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).
* Depends-on support: Patrick pinged Thomas about this this morning.
* https://github.com/getpatchwork/patchwork/issues/583 and
https://github.com/getpatchwork/git-pw/issues/71
* MSVC: Tech board discussed extending the dpdk libraries which
compile with MSVC in CI testing, and making all new libraries which
will be used by Windows require compile using MSVC
* Some members mentioned difficulty due to burden of running
Windows VM to test their patches against before CI
* One solution is GitHub actions
* Honnappa requested lab host a windows VM as a community
resource. Users could SSH onto the lab VPN, and use that machine.
* Patrick Robbwill follow up on the mailing list to see
whether the ci group approves of this idea.
* DPDK Summit will most likely be in Montreal
* Once we have a date, Patrick will suggest to GB and TB that
anyone who is interested can visit the lab the date after
* CFP:
* Should probably give a DTS update, which can be from Patrick,
other UNH people, Honnappa, maybe Juraj (remotely)
* UNH folks can probably do a CI testing update
* Discuss new hardware
* Discuss new testing
* Discuss new reporting functionality, retests, depends-on,
other qol stuff
=====================================================================
CI Status
---------------------------------------------------------------------
UNH-IOL Community Lab
* Dodji Seketeli is requesting information about the Community Lab’s
ABI jobs to investigate an error on his patch
* Libabigail version is 2.2.0
* Patrick will send him the .so abi ref dirs this morning.
* Marvell CN10K:
* TG is working, Octeon DUT can run DPDK apps and forward packets.
* Can’t figure out how to reconfigure the link speed on the QSFP
port (want 2x100GbE not 4x 50GbE) - will ask Marvell people to SSH on
to set this
* Also need to verify the correct meson options for native builds on the DUT
* right now just using “meson setup -Dplatform=cn10k build” from dpdk docs
* Juraj states that for ARM cpus (which is on this board) you
should be able to natively compile with default options
* SPDK: Working on these compile jobs
* Currently compile with:
* Ubuntu 22.04
* Debian 11
* Debian 12
* CentOS 8
* CentOS 9
* Fedora 37
* Fedora 38
* Fedora 39
* Opensuse-Leap 15 but with a warning
* Cannot compile with:
* Rhel 8
* Rhel 9
* SPDK docs state rhel is “best effort”
* Questions:
* Should we run with werror enabled?
* What versions of SPDK do we test?
* What versions of DPDK do we test SPDK against?
* Unit tests pass with the distros which are compiling
* OvS DPDK testing:
* * Lab sent an email to test-report which got blocked because it
was just above 500kb, which is the limit
* Ts-factory redirect added to dpdk community lab dashboard navbar
---------------------------------------------------------------------
Intel Lab
* None
---------------------------------------------------------------------
Github Actions
* None
---------------------------------------------------------------------
Loongarch Lab
* None
---------------------------------------------------------------------
DTS Improvements & Test Development
* Nick’s hugepages patch will be submitted today (or already is).
* Forces 2mb hugepages
* Nick is starting on porting the jumboframes testsuite now
* Starting by manually running scapy, testpmd, tcpdump to verify
the function works, then writing the suite in DTS
* Jeremy is working on the context manager for testpmd to ensure it
closes completely before we attempt to start it again for a subsequent
testcase
* Juraj has provided an initial review of Luca’s testpmd params patch,
the implementation may need to be refactored, but the idea of
simplifying the developer user experience is a good goal
* Jeremy Spewockwill write to Juraj about the capabilities patch. UNH
can test this if needed.
* Other than the testcase capabilities check patch, Juraj will be
renaming the dts execution and doing work for supporting pre-built
DPDK for the SUT
* Luca ran into what may have been a paramiko race condition from when
the interactive shell closes. We are unsure what exactly is happening
but we will probably need to hotfix this. Would likely require some
checks when closing the section.
* Luca tried to run from two intel nics, and could bind to vfio-pci,
but then timed out when trying to rebind to i40e. Left with 1
interface bound to vfio, one interface bound to i40e.
* Can try rebinding the ports with 1 command, instead of 1 by 1
* Maybe tried to run dpdk-devbind before all DPDK resources had
been released (just speculation)
=====================================================================
Any other business
* Next Meeting: April 20, 2024
^ permalink raw reply [relevance 3%]
* Re: [PATCH] lib: add get/set link settings interface
2024-04-03 16:49 3% ` Tyler Retzlaff
@ 2024-04-04 7:09 4% ` David Marchand
2024-04-05 0:55 0% ` Tyler Retzlaff
0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-04-04 7:09 UTC (permalink / raw)
To: Tyler Retzlaff, Marek Pazdan
Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev
Hello Tyler, Marek,
On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > There are link settings parameters available from PMD drivers level
> > which are currently not exposed to the user via consistent interface.
> > When interface is available for system level those information can
> > be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > ETHTOOL_GLINKSETTINGS). There are use cases where
> > physical interface is passthrough to dpdk driver and is not available
> > from system level. Information provided by ioctl carries information
> > useful for link auto negotiation settings among others.
> >
> > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > ---
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 147257d6a2..66aad925d0 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > __extension__
> > struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */
> > - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > + uint16_t link_duplex : 2; /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
> > };
>
> this breaks the abi. David does libabigail pick this up i wonder?
>
Yes, the CI flagged it.
Looking at the UNH report (in patchwork):
http://mails.dpdk.org/archives/test-report/2024-April/631222.html
1 function with some indirect sub-type change:
[C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
rte_ethdev.c:2972:1 has some indirect sub-type changes:
parameter 2 of type 'rte_eth_link*' has sub-type changes:
in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
type size hasn't changed
2 data member changes:
'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
Error: ABI issue reported for abidiff --suppr
/home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
reference/usr/local/include --headers-dir2
build_install/usr/local/include
reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
this as a potential issue).
GHA would have caught it too, but the documentation generation failed
before reaching the ABI check.
http://mails.dpdk.org/archives/test-report/2024-April/631086.html
--
David Marchand
^ permalink raw reply [relevance 4%]
* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
2024-04-03 19:32 0% ` Morten Brørup
@ 2024-04-03 22:45 0% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 22:45 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang
On Wed, Apr 03, 2024 at 09:32:21PM +0200, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 3 April 2024 19.54
> >
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> >
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> >
> > Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> > unions as single element arrays of with types matching the original
> > markers to maintain API compatibility.
> >
> > This change breaks the API for cacheline{0,1} fields that have been
> > removed from rte_mbuf but it does not break the ABI, to address the
> > false positives of the removed (but 0 size fields) provide the minimum
> > libabigail.abignore for type = rte_mbuf.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
>
> [...]
>
> > + /* remaining 24 bytes are set on RX when pulling packet from
> > descriptor */
>
> Good.
>
> > union {
> > + /* void * type of the array elements is retained for driver
> > compatibility. */
> > + void *rx_descriptor_fields1[24 / sizeof(void *)];
>
> Good, also the description.
>
> > __extension__
> > struct {
> > - uint8_t l2_type:4; /**< (Outer) L2 type. */
> > - uint8_t l3_type:4; /**< (Outer) L3 type. */
> > - uint8_t l4_type:4; /**< (Outer) L4 type. */
> > - uint8_t tun_type:4; /**< Tunnel type. */
> > + /*
> > + * The packet type, which is the combination of
> > outer/inner L2, L3, L4
> > + * and tunnel types. The packet_type is about data
> > really present in the
> > + * mbuf. Example: if vlan stripping is enabled, a
> > received vlan packet
> > + * would have RTE_PTYPE_L2_ETHER and not
> > RTE_PTYPE_L2_VLAN because the
> > + * vlan is stripped from the data.
> > + */
> > union {
> > - uint8_t inner_esp_next_proto;
> > - /**< ESP next protocol type, valid if
> > - * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> > - * on both Tx and Rx.
> > - */
>
> [...]
>
> > + /**< ESP next protocol type, valid
> > if
> > + * RTE_PTYPE_TUNNEL_ESP tunnel type
> > is set
> > + * on both Tx and Rx.
> > + */
> > + uint8_t inner_esp_next_proto;
>
> Thank you for moving the comments up before the fields.
>
> Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.
ooh, i'll fix it i'm not well versed in doxygen documentation.
>
> Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
> If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.
it's easy enough for me to fix the comments in place and bring in a new
version of the series, assuming other reviewers don't object i'll do that.
the diff is already kind of annoying to review in mail without -b
anyway.
>
> [...]
>
> > /* second cache line - fields only used in slow path or on TX */
> > - alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> > -
> > #if RTE_IOVA_IN_MBUF
> > /**
> > * Next segment of scattered packet. Must be NULL in the last
> > * segment or in case of non-segmented packet.
> > */
> > + alignas(RTE_CACHE_LINE_MIN_SIZE)
> > struct rte_mbuf *next;
> > #else
> > /**
> > * Reserved for dynamic fields
> > * when the next pointer is in first cache line (i.e.
> > RTE_IOVA_IN_MBUF is 0).
> > */
> > + alignas(RTE_CACHE_LINE_MIN_SIZE)
>
> Good positioning of the alignas().
>
> I like everything in this patch.
>
> Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.
ack, next rev.
^ permalink raw reply [relevance 0%]
* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
2024-04-03 17:53 2% ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-03 19:32 0% ` Morten Brørup
@ 2024-04-03 21:49 0% ` Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-03 21:49 UTC (permalink / raw)
To: Tyler Retzlaff
Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb
On Wed, 3 Apr 2024 10:53:34 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
>
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
>
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
>
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
^ permalink raw reply [relevance 0%]
* RE: [PATCH v10 2/4] mbuf: remove rte marker fields
2024-04-03 17:53 2% ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
@ 2024-04-03 19:32 0% ` Morten Brørup
2024-04-03 22:45 0% ` Tyler Retzlaff
2024-04-03 21:49 0% ` Stephen Hemminger
1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-03 19:32 UTC (permalink / raw)
To: Tyler Retzlaff, dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang
> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 3 April 2024 19.54
>
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
>
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
>
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
>
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
[...]
> + /* remaining 24 bytes are set on RX when pulling packet from
> descriptor */
Good.
> union {
> + /* void * type of the array elements is retained for driver
> compatibility. */
> + void *rx_descriptor_fields1[24 / sizeof(void *)];
Good, also the description.
> __extension__
> struct {
> - uint8_t l2_type:4; /**< (Outer) L2 type. */
> - uint8_t l3_type:4; /**< (Outer) L3 type. */
> - uint8_t l4_type:4; /**< (Outer) L4 type. */
> - uint8_t tun_type:4; /**< Tunnel type. */
> + /*
> + * The packet type, which is the combination of
> outer/inner L2, L3, L4
> + * and tunnel types. The packet_type is about data
> really present in the
> + * mbuf. Example: if vlan stripping is enabled, a
> received vlan packet
> + * would have RTE_PTYPE_L2_ETHER and not
> RTE_PTYPE_L2_VLAN because the
> + * vlan is stripped from the data.
> + */
> union {
> - uint8_t inner_esp_next_proto;
> - /**< ESP next protocol type, valid if
> - * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> - * on both Tx and Rx.
> - */
[...]
> + /**< ESP next protocol type, valid
> if
> + * RTE_PTYPE_TUNNEL_ESP tunnel type
> is set
> + * on both Tx and Rx.
> + */
> + uint8_t inner_esp_next_proto;
Thank you for moving the comments up before the fields.
Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.
Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.
[...]
> /* second cache line - fields only used in slow path or on TX */
> - alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> -
> #if RTE_IOVA_IN_MBUF
> /**
> * Next segment of scattered packet. Must be NULL in the last
> * segment or in case of non-segmented packet.
> */
> + alignas(RTE_CACHE_LINE_MIN_SIZE)
> struct rte_mbuf *next;
> #else
> /**
> * Reserved for dynamic fields
> * when the next pointer is in first cache line (i.e.
> RTE_IOVA_IN_MBUF is 0).
> */
> + alignas(RTE_CACHE_LINE_MIN_SIZE)
Good positioning of the alignas().
I like everything in this patch.
Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.
^ permalink raw reply [relevance 0%]
* [PATCH v10 2/4] mbuf: remove rte marker fields
2024-04-03 17:53 3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-03 17:53 2% ` Tyler Retzlaff
2024-04-03 19:32 0% ` Morten Brørup
2024-04-03 21:49 0% ` Stephen Hemminger
0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.
Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 3 +
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 200 +++++++++++++++++----------------
4 files changed, 115 insertions(+), 98 deletions(-)
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
[suppress_type]
name = rte_eth_fp_ops
has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+ name = rte_mbuf
+ type_kind = struct
+ has_size_change = no
+ has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
Also, make sure to start the actual text at the margin.
=======================================================
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+ have been removed from ``struct rte_mbuf``.
+
API Changes
-----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
static inline void
rte_mbuf_prefetch_part1(struct rte_mbuf *m)
{
- rte_prefetch0(&m->cacheline0);
+ rte_prefetch0(m);
}
/**
@@ -126,7 +126,7 @@
rte_mbuf_prefetch_part2(struct rte_mbuf *m)
{
#if RTE_CACHE_LINE_SIZE == 64
- rte_prefetch0(&m->cacheline1);
+ rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
#else
RTE_SET_USED(m);
#endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..9d838b8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
* The generic rte_mbuf, containing a packet mbuf.
*/
struct __rte_cache_aligned rte_mbuf {
- RTE_MARKER cacheline0;
-
void *buf_addr; /**< Virtual address of segment buffer. */
#if RTE_IOVA_IN_MBUF
/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
#endif
/* next 8 bytes are initialised on RX descriptor rearm */
- RTE_MARKER64 rearm_data;
- uint16_t data_off;
-
- /**
- * Reference counter. Its size should at least equal to the size
- * of port field (16 bits), to support zero-copy broadcast.
- * It should only be accessed using the following functions:
- * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
- * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
- * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
- */
- RTE_ATOMIC(uint16_t) refcnt;
+ union {
+ uint64_t rearm_data[1];
+ __extension__
+ struct {
+ uint16_t data_off;
+
+ /**
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
+ * It should only be accessed using the following functions:
+ * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+ * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+ * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+ */
+ RTE_ATOMIC(uint16_t) refcnt;
- /**
- * Number of segments. Only valid for the first segment of an mbuf
- * chain.
- */
- uint16_t nb_segs;
+ /**
+ * Number of segments. Only valid for the first segment of an mbuf
+ * chain.
+ */
+ uint16_t nb_segs;
- /** Input port (16 bits to support more than 256 virtual ports).
- * The event eth Tx adapter uses this field to specify the output port.
- */
- uint16_t port;
+ /** Input port (16 bits to support more than 256 virtual ports).
+ * The event eth Tx adapter uses this field to specify the output port.
+ */
+ uint16_t port;
+ };
+ };
uint64_t ol_flags; /**< Offload features. */
- /* remaining bytes are set on RX when pulling packet from descriptor */
- RTE_MARKER rx_descriptor_fields1;
-
- /*
- * The packet type, which is the combination of outer/inner L2, L3, L4
- * and tunnel types. The packet_type is about data really present in the
- * mbuf. Example: if vlan stripping is enabled, a received vlan packet
- * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
- * vlan is stripped from the data.
- */
+ /* remaining 24 bytes are set on RX when pulling packet from descriptor */
union {
- uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+ /* void * type of the array elements is retained for driver compatibility. */
+ void *rx_descriptor_fields1[24 / sizeof(void *)];
__extension__
struct {
- uint8_t l2_type:4; /**< (Outer) L2 type. */
- uint8_t l3_type:4; /**< (Outer) L3 type. */
- uint8_t l4_type:4; /**< (Outer) L4 type. */
- uint8_t tun_type:4; /**< Tunnel type. */
+ /*
+ * The packet type, which is the combination of outer/inner L2, L3, L4
+ * and tunnel types. The packet_type is about data really present in the
+ * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+ * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+ * vlan is stripped from the data.
+ */
union {
- uint8_t inner_esp_next_proto;
- /**< ESP next protocol type, valid if
- * RTE_PTYPE_TUNNEL_ESP tunnel type is set
- * on both Tx and Rx.
- */
+ uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
__extension__
struct {
- uint8_t inner_l2_type:4;
- /**< Inner L2 type. */
- uint8_t inner_l3_type:4;
- /**< Inner L3 type. */
+ uint8_t l2_type:4; /**< (Outer) L2 type. */
+ uint8_t l3_type:4; /**< (Outer) L3 type. */
+ uint8_t l4_type:4; /**< (Outer) L4 type. */
+ uint8_t tun_type:4; /**< Tunnel type. */
+ union {
+ /**< ESP next protocol type, valid if
+ * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+ * on both Tx and Rx.
+ */
+ uint8_t inner_esp_next_proto;
+ __extension__
+ struct {
+ /**< Inner L2 type. */
+ uint8_t inner_l2_type:4;
+ /**< Inner L3 type. */
+ uint8_t inner_l3_type:4;
+ };
+ };
+ uint8_t inner_l4_type:4; /**< Inner L4 type. */
};
};
- uint8_t inner_l4_type:4; /**< Inner L4 type. */
- };
- };
- uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
- uint16_t vlan_tci;
+ uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- union {
- union {
- uint32_t rss; /**< RSS hash result if RSS enabled */
- struct {
+ uint16_t data_len; /**< Amount of data in segment buffer. */
+ /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+ uint16_t vlan_tci;
+
+ union {
union {
+ uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
- uint16_t hash;
- uint16_t id;
- };
- uint32_t lo;
- /**< Second 4 flexible bytes */
- };
- uint32_t hi;
- /**< First 4 flexible bytes or FD ID, dependent
- * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
- */
- } fdir; /**< Filter identifier if FDIR enabled */
- struct rte_mbuf_sched sched;
- /**< Hierarchical scheduler : 8 bytes */
- struct {
- uint32_t reserved1;
- uint16_t reserved2;
- uint16_t txq;
- /**< The event eth Tx adapter uses this field
- * to store Tx queue id.
- * @see rte_event_eth_tx_adapter_txq_set()
- */
- } txadapter; /**< Eventdev ethdev Tx adapter */
- uint32_t usr;
- /**< User defined tags. See rte_distributor_process() */
- } hash; /**< hash information */
- };
+ union {
+ struct {
+ uint16_t hash;
+ uint16_t id;
+ };
+ /**< Second 4 flexible bytes */
+ uint32_t lo;
+ };
+ /**< First 4 flexible bytes or FD ID, dependent
+ * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+ */
+ uint32_t hi;
+ } fdir; /**< Filter identifier if FDIR enabled */
+ struct rte_mbuf_sched sched;
+ /**< Hierarchical scheduler : 8 bytes */
+ struct {
+ uint32_t reserved1;
+ uint16_t reserved2;
+ /**< The event eth Tx adapter uses this field
+ * to store Tx queue id.
+ * @see rte_event_eth_tx_adapter_txq_set()
+ */
+ uint16_t txq;
+ } txadapter; /**< Eventdev ethdev Tx adapter */
+ /**< User defined tags. See rte_distributor_process() */
+ uint32_t usr;
+ } hash; /**< hash information */
+ };
- /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
- uint16_t vlan_tci_outer;
+ /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+ uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
+ uint16_t buf_len; /**< Length of segment buffer. */
+ };
+ };
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
/* second cache line - fields only used in slow path or on TX */
- alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
#if RTE_IOVA_IN_MBUF
/**
* Next segment of scattered packet. Must be NULL in the last
* segment or in case of non-segmented packet.
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
struct rte_mbuf *next;
#else
/**
* Reserved for dynamic fields
* when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
uint64_t dynfield2;
#endif
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
uint64_t tx_offload; /**< combined for easy fetch */
__extension__
struct {
- uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
/**< L2 (MAC) Header Length for non-tunneling pkt.
* Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
*/
- uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+ uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
/**< L3 (IP) Header Length. */
- uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+ uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
/**< L4 (TCP/UDP) Header Length. */
- uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+ uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
/**< TCP TSO segment size */
-
+ uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
/*
* Fields for Tx offloading of tunnels.
* These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
* Applications are expected to set appropriate tunnel
* offload flags when they fill in these fields.
*/
- uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
/**< Outer L3 (IP) Hdr Length. */
- uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+ uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
/**< Outer L2 (MAC) Hdr Length. */
+ uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
};
--
1.8.3.1
^ permalink raw reply [relevance 2%]
* [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries
2024-04-02 20:08 3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-03 17:53 3% ` Tyler Retzlaff
2024-04-03 17:53 2% ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-04 17:51 3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-06-19 15:01 3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.
For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
Note: diff is easier viewed with -b due to additional nesting from
unions / structs that have been introduced.
v10:
* move removal notices in in release notes from 24.03 to 24.07
v9:
* provide narrowest possible libabigail.abignore to suppress
removal of fields that were agreed are not actual abi changes.
v8:
* rx_descriptor_fields1 array is now constexpr sized to
24 / sizeof(void *) so that the array encompasses fields
accessed via the array.
* add a comment to rx_descriptor_fields1 array site noting
that void * type of elements is retained for compatibility
with existing drivers.
* clean up comments of fields in rte_mbuf to be before the
field they apply to instead of after.
* duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
conditional compile for first field of cacheline 1 instead of
once before conditional compile block.
v7:
* complete re-write of series, previous versions not noted. all
reviewed-by and acked-by tags (if any) were removed.
Tyler Retzlaff (4):
net/i40e: use inline prefetch function
mbuf: remove rte marker fields
security: remove rte marker fields
cryptodev: remove rte marker fields
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_07.rst | 9 ++
drivers/net/i40e/i40e_rxtx_vec_avx512.c | 2 +-
lib/cryptodev/cryptodev_pmd.h | 5 +-
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 200 +++++++++++++++++---------------
lib/security/rte_security_driver.h | 5 +-
7 files changed, 128 insertions(+), 103 deletions(-)
--
1.8.3.1
^ permalink raw reply [relevance 3%]
* Re: [PATCH] lib: add get/set link settings interface
@ 2024-04-03 16:49 3% ` Tyler Retzlaff
2024-04-04 7:09 4% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-03 16:49 UTC (permalink / raw)
To: Marek Pazdan
Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, david.marchand
On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> There are link settings parameters available from PMD drivers level
> which are currently not exposed to the user via consistent interface.
> When interface is available for system level those information can
> be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> ETHTOOL_GLINKSETTINGS). There are use cases where
> physical interface is passthrough to dpdk driver and is not available
> from system level. Information provided by ioctl carries information
> useful for link auto negotiation settings among others.
>
> Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> ---
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 147257d6a2..66aad925d0 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -335,7 +335,7 @@ struct rte_eth_stats {
> __extension__
> struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> uint32_t link_speed; /**< RTE_ETH_SPEED_NUM_ */
> - uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> + uint16_t link_duplex : 2; /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
> };
this breaks the abi. David does libabigail pick this up i wonder?
^ permalink raw reply [relevance 3%]
* RE: The effect of inlining
2024-04-01 15:20 3% ` Mattias Rönnblom
@ 2024-04-03 16:01 3% ` Morten Brørup
0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2024-04-03 16:01 UTC (permalink / raw)
To: Mattias Rönnblom, Maxime Coquelin, Stephen Hemminger,
Andrey Ignatov
Cc: dev, Chenbo Xia, Wei Shen, techboard
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Monday, 1 April 2024 17.20
>
> On 2024-03-29 14:42, Morten Brørup wrote:
> > +CC techboard
> >
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Friday, 29 March 2024 14.05
> >>
> >> Hi Stephen,
> >>
> >> On 3/29/24 03:53, Stephen Hemminger wrote:
> >>> On Thu, 28 Mar 2024 17:10:42 -0700
> >>> Andrey Ignatov <rdna@apple.com> wrote:
> >>>
> >>>>>
> >>>>> You don't need always inline, the compiler will do it anyway.
> >>>>
> >>>> I can remove it in v2, but it's not completely obvious to me how is
> >> it
> >>>> decided when to specify it explicitly and when not?
> >>>>
> >>>> I see plenty of __rte_always_inline in this file:
> >>>>
> >>>> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> >>>> lib/vhost/virtio_net.c:66
> >>>
> >>>
> >>> Cargo cult really.
> >>>
> >>
> >> Cargo cult... really?
> >>
> >> Well, I just did a quick test by comparing IO forwarding with testpmd
> >> between main branch and with adding a patch that removes all the
> >> inline/noinline in lib/vhost/virtio_net.c [0].
> >>
> >> main branch: 14.63Mpps
> >> main branch - inline/noinline: 10.24Mpps
> >
> > Thank you for testing this, Maxime. Very interesting!
> >
> > It is sometimes suggested on techboard meetings that we should convert
> more inline functions to non-inline for improved API/ABI stability, with
> the argument that the performance of inlining is negligible.
> >
>
> I think you are mixing two different (but related) things here.
> 1) marking functions with the inline family of keywords/attributes
> 2) keeping function definitions in header files
I'm talking about 2. The reason for wanting to avoid inline function definitions in header files is to hide more of the implementation behind the API, thus making it easier to change the implementation without breaking the API/ABI. Sorry about not making this clear.
>
> 1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the
> API (i.e., source-level compatibility).
>
> 2) *allows* for function inlining even in non-LTO builds, but doesn't
> force it.
>
> If you don't believe 2) makes a difference performance-wise, it follows
> that you also don't believe LTO makes much of a difference. Both have
> the same effect: allowing the compiler to reason over a larger chunk of
> your program.
>
> Allowing the compiler to inline small, often-called functions is crucial
> for performance, in my experience. If the target symbol tend to be in a
> shared object, the difference is even larger. It's also quite common
> that you see no effect of LTO (other than a reduction of code
> footprint).
>
> As LTO becomes more practical to use, 2) loses much of its appeal.
>
> If PGO ever becomes practical to use, maybe 1) will as well.
>
> > I think this test proves that the sum of many small (negligible)
> performance differences it not negligible!
> >
> >>
> >> Andrey, thanks for the patch, I'll have a look at it next week.
> >>
> >> Maxime
> >>
> >> [0]: https://pastebin.com/72P2npZ0
> >
^ permalink raw reply [relevance 3%]
* Re: Issues around packet capture when secondary process is doing rx/tx
@ 2024-04-03 11:43 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-03 11:43 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger, dev
Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan, Konstantin Ananyev
On 1/8/2024 10:41 AM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Monday, 8 January 2024 02.59
>>
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary
>> process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
>
> So, callbacks don't work across processes, because code might differ across processes.
>
> If process A is running, and RX'ing and TX'ing, and process B wants to install its own callbacks (e.g. packet capture) on RX and RX, we basically want process A to execute code residing in process B, which is impossible.
>
Callbacks stored in "struct rte_eth_dev", so it is per process, which
means primary and secondaries has their own copies of callbacks, as
Konstantin explained.
So, how pdump works :), it uses MP support and shared ring similar to
you mentioned below. More detail:
- Primary registers a MP handler
- pdump secondary process sends a MP message with a ring and mempool in
the message
- When primary receives the MP message it registers its *own* callbacks
that gets 'ring' as parameter
- Callbacks clone packets to 'ring', that is how pdump secondary process
access to the packets
> An alternative could be to pass the packets through a ring in shared memory. However, this method would add the ring processing latency of process B to the RX/TX latency of process A.
>
> I think we can conclude that callbacks are one of the things that don't work with secondary processes.
>
> With this decided, we can then consider how to best add packet capture. The concept of passing "data" (instead of calling functions) across processes obviously applies to this use case.
>
>>
>> An example sequence would be:
>> 1. dumpcap (or pdump) as secondary tells pdump in primary to
>> register callback
>> 2. secondary process calls rx_burst.
>> 3. rx_burst sees the callback but it has pointer pdump_rx which
>> is not necessarily
>> at same location in primary and secondary process.
>> 4. indirect function call in secondary to bad location likely
>> causes crash.
>>
>> Some possible workarounds.
>> 1. Keep callback list per-process: messy, but won't crash.
>> Capture won't work
>> without other changes. In this primary would register
>> callback, but secondaries
>> would not use them in rx/tx burst.
>>
>> 2. Replace use of rx/tx callback in pdump with change to
>> rte_ethdev to have
>> a capture flag. (i.e. don't use indirection). Likely ABI
>> problems.
>> Basically, ignore the rx/tx callback mechanism. This is my
>> preferred
>> solution.
>>
>> 3. Some fix up mechanism (in EAL mp support?) to have each
>> process fixup
>> its callback mechanism.
>>
>> 4. Do something in pdump_init to register the callback in same
>> process context
>> (probably need callbacks to be per-process). Would mean
>> callback is always
>> on independent of capture being enabled.
>>
>> 5. Get rid of indirect function call pointer, and replace it by
>> index into
>> a static table of callback functions. Every process would
>> have same code
>> (in this case pdump_rx) but at different address. Requires
>> all callbacks
>> to be statically defined at build time.
>>
>> The existing rx/tx callback is not safe id rx/tx burst is called from
>> different process
>> than where callback is registered.
>>
>
^ permalink raw reply [relevance 0%]
* Re: Issues around packet capture when secondary process is doing rx/tx
2024-04-03 0:14 4% ` Stephen Hemminger
@ 2024-04-03 11:42 0% ` Ferruh Yigit
1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-03 11:42 UTC (permalink / raw)
To: Konstantin Ananyev, Stephen Hemminger, dev
Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan
On 1/8/2024 3:13 PM, Konstantin Ananyev wrote:
>
>
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
>>
>> An example sequence would be:
>> 1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>> 2. secondary process calls rx_burst.
>> 3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>> at same location in primary and secondary process.
>> 4. indirect function call in secondary to bad location likely causes crash.
>
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
>
Ack. There should be another reason for crash.
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now.
>
Currently testpmd calls 'rte_pdump_init()', and both primary testpmd and
secondary testpmd process calls this API and both register PDUMP_MP
handler, I think this is OK.
When pdump secondary process sends MP message, both primary testpmd and
secondary testpmd process should register callbacks with provided ring
and mempool information.
I don't know if both primary and secondary process callbacks running
simultaneously causing this problem, otherwise I expect it to work.
>>
>> Some possible workarounds.
>> 1. Keep callback list per-process: messy, but won't crash. Capture won't work
>> without other changes. In this primary would register callback, but secondaries
>> would not use them in rx/tx burst.
>>
>> 2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
>> a capture flag. (i.e. don't use indirection). Likely ABI problems.
>> Basically, ignore the rx/tx callback mechanism. This is my preferred
>> solution.
>
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
>
>> 3. Some fix up mechanism (in EAL mp support?) to have each process fixup
>> its callback mechanism.
>
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
>
>>
>> 4. Do something in pdump_init to register the callback in same process context
>> (probably need callbacks to be per-process). Would mean callback is always
>> on independent of capture being enabled.
>>
>> 5. Get rid of indirect function call pointer, and replace it by index into
>> a static table of callback functions. Every process would have same code
>> (in this case pdump_rx) but at different address. Requires all callbacks
>> to be statically defined at build time.
>
> Doesn't look like a good approach - it will break many things.
>
>> The existing rx/tx callback is not safe id rx/tx burst is called from different process
>> than where callback is registered.
>
>
^ permalink raw reply [relevance 0%]
* Re: Issues around packet capture when secondary process is doing rx/tx
@ 2024-04-03 0:14 4% ` Stephen Hemminger
2024-04-03 11:42 0% ` Ferruh Yigit
1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-03 0:14 UTC (permalink / raw)
To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan
On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
> > I have been looking at a problem reported by Sandesh
> > where packet capture does not work if rx/tx burst is done in secondary process.
> >
> > The root cause is that existing rx/tx callback model just doesn't work
> > unless the process doing the rx/tx burst calls is the same one that
> > registered the callbacks.
> >
> > An example sequence would be:
> > 1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > 2. secondary process calls rx_burst.
> > 3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > at same location in primary and secondary process.
> > 4. indirect function call in secondary to bad location likely causes crash.
>
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
>
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now.
>
> >
> > Some possible workarounds.
> > 1. Keep callback list per-process: messy, but won't crash. Capture won't work
> > without other changes. In this primary would register callback, but secondaries
> > would not use them in rx/tx burst.
> >
> > 2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
> > a capture flag. (i.e. don't use indirection). Likely ABI problems.
> > Basically, ignore the rx/tx callback mechanism. This is my preferred
> > solution.
>
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
>
> > 3. Some fix up mechanism (in EAL mp support?) to have each process fixup
> > its callback mechanism.
>
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
>
> >
> > 4. Do something in pdump_init to register the callback in same process context
> > (probably need callbacks to be per-process). Would mean callback is always
> > on independent of capture being enabled.
> >
> > 5. Get rid of indirect function call pointer, and replace it by index into
> > a static table of callback functions. Every process would have same code
> > (in this case pdump_rx) but at different address. Requires all callbacks
> > to be statically defined at build time.
>
> Doesn't look like a good approach - it will break many things.
>
> > The existing rx/tx callback is not safe id rx/tx burst is called from different process
> > than where callback is registered.
>
>
Have been looking into best way to fix this, and the real answer is not to use
callbacks but instead use a flag per-queue. The natural place to put these in
rte_ethdev_driver. BUT this will mean an ABI breakage, so will have to wait for 24.11
release. Sometimes fixing a design flaw means an ABI change.
^ permalink raw reply [relevance 4%]
* Re: [PATCH v9 2/4] mbuf: remove rte marker fields
2024-04-02 20:45 0% ` Stephen Hemminger
@ 2024-04-02 20:51 0% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:51 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb
On Tue, Apr 02, 2024 at 01:45:49PM -0700, Stephen Hemminger wrote:
> On Tue, 2 Apr 2024 13:08:48 -0700
> Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
>
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> >
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> >
> > Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> > unions as single element arrays of with types matching the original
> > markers to maintain API compatibility.
> >
> > This change breaks the API for cacheline{0,1} fields that have been
> > removed from rte_mbuf but it does not break the ABI, to address the
> > false positives of the removed (but 0 size fields) provide the minimum
> > libabigail.abignore for type = rte_mbuf.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>
> Release note should be for 24.07 not 24.03.
yeah, pressed send and noticed it seconds later. when the new empty
release notes are added i'll move the notes to 24.07.
thanks.
^ permalink raw reply [relevance 0%]
* Re: [PATCH v9 2/4] mbuf: remove rte marker fields
2024-04-02 20:08 2% ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
@ 2024-04-02 20:45 0% ` Stephen Hemminger
2024-04-02 20:51 0% ` Tyler Retzlaff
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-02 20:45 UTC (permalink / raw)
To: Tyler Retzlaff
Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb
On Tue, 2 Apr 2024 13:08:48 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
>
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
>
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
>
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Release note should be for 24.07 not 24.03.
^ permalink raw reply [relevance 0%]
* [PATCH v9 2/4] mbuf: remove rte marker fields
2024-04-02 20:08 3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-02 20:08 2% ` Tyler Retzlaff
2024-04-02 20:45 0% ` Stephen Hemminger
0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:08 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.
Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_03.rst | 2 +
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 200 +++++++++++++++++----------------
4 files changed, 114 insertions(+), 98 deletions(-)
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
[suppress_type]
name = rte_eth_fp_ops
has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+ name = rte_mbuf
+ type_kind = struct
+ has_size_change = no
+ has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 013c12f..ffc0d62 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -161,6 +161,8 @@ Removed Items
* acc101: Removed obsolete code for non productized HW variant.
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+ have been removed from ``struct rte_mbuf``.
API Changes
-----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
static inline void
rte_mbuf_prefetch_part1(struct rte_mbuf *m)
{
- rte_prefetch0(&m->cacheline0);
+ rte_prefetch0(m);
}
/**
@@ -126,7 +126,7 @@
rte_mbuf_prefetch_part2(struct rte_mbuf *m)
{
#if RTE_CACHE_LINE_SIZE == 64
- rte_prefetch0(&m->cacheline1);
+ rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
#else
RTE_SET_USED(m);
#endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..9d838b8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
* The generic rte_mbuf, containing a packet mbuf.
*/
struct __rte_cache_aligned rte_mbuf {
- RTE_MARKER cacheline0;
-
void *buf_addr; /**< Virtual address of segment buffer. */
#if RTE_IOVA_IN_MBUF
/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
#endif
/* next 8 bytes are initialised on RX descriptor rearm */
- RTE_MARKER64 rearm_data;
- uint16_t data_off;
-
- /**
- * Reference counter. Its size should at least equal to the size
- * of port field (16 bits), to support zero-copy broadcast.
- * It should only be accessed using the following functions:
- * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
- * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
- * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
- */
- RTE_ATOMIC(uint16_t) refcnt;
+ union {
+ uint64_t rearm_data[1];
+ __extension__
+ struct {
+ uint16_t data_off;
+
+ /**
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
+ * It should only be accessed using the following functions:
+ * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+ * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+ * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+ */
+ RTE_ATOMIC(uint16_t) refcnt;
- /**
- * Number of segments. Only valid for the first segment of an mbuf
- * chain.
- */
- uint16_t nb_segs;
+ /**
+ * Number of segments. Only valid for the first segment of an mbuf
+ * chain.
+ */
+ uint16_t nb_segs;
- /** Input port (16 bits to support more than 256 virtual ports).
- * The event eth Tx adapter uses this field to specify the output port.
- */
- uint16_t port;
+ /** Input port (16 bits to support more than 256 virtual ports).
+ * The event eth Tx adapter uses this field to specify the output port.
+ */
+ uint16_t port;
+ };
+ };
uint64_t ol_flags; /**< Offload features. */
- /* remaining bytes are set on RX when pulling packet from descriptor */
- RTE_MARKER rx_descriptor_fields1;
-
- /*
- * The packet type, which is the combination of outer/inner L2, L3, L4
- * and tunnel types. The packet_type is about data really present in the
- * mbuf. Example: if vlan stripping is enabled, a received vlan packet
- * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
- * vlan is stripped from the data.
- */
+ /* remaining 24 bytes are set on RX when pulling packet from descriptor */
union {
- uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+ /* void * type of the array elements is retained for driver compatibility. */
+ void *rx_descriptor_fields1[24 / sizeof(void *)];
__extension__
struct {
- uint8_t l2_type:4; /**< (Outer) L2 type. */
- uint8_t l3_type:4; /**< (Outer) L3 type. */
- uint8_t l4_type:4; /**< (Outer) L4 type. */
- uint8_t tun_type:4; /**< Tunnel type. */
+ /*
+ * The packet type, which is the combination of outer/inner L2, L3, L4
+ * and tunnel types. The packet_type is about data really present in the
+ * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+ * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+ * vlan is stripped from the data.
+ */
union {
- uint8_t inner_esp_next_proto;
- /**< ESP next protocol type, valid if
- * RTE_PTYPE_TUNNEL_ESP tunnel type is set
- * on both Tx and Rx.
- */
+ uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
__extension__
struct {
- uint8_t inner_l2_type:4;
- /**< Inner L2 type. */
- uint8_t inner_l3_type:4;
- /**< Inner L3 type. */
+ uint8_t l2_type:4; /**< (Outer) L2 type. */
+ uint8_t l3_type:4; /**< (Outer) L3 type. */
+ uint8_t l4_type:4; /**< (Outer) L4 type. */
+ uint8_t tun_type:4; /**< Tunnel type. */
+ union {
+ /**< ESP next protocol type, valid if
+ * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+ * on both Tx and Rx.
+ */
+ uint8_t inner_esp_next_proto;
+ __extension__
+ struct {
+ /**< Inner L2 type. */
+ uint8_t inner_l2_type:4;
+ /**< Inner L3 type. */
+ uint8_t inner_l3_type:4;
+ };
+ };
+ uint8_t inner_l4_type:4; /**< Inner L4 type. */
};
};
- uint8_t inner_l4_type:4; /**< Inner L4 type. */
- };
- };
- uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
- uint16_t vlan_tci;
+ uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
- union {
- union {
- uint32_t rss; /**< RSS hash result if RSS enabled */
- struct {
+ uint16_t data_len; /**< Amount of data in segment buffer. */
+ /** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+ uint16_t vlan_tci;
+
+ union {
union {
+ uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
- uint16_t hash;
- uint16_t id;
- };
- uint32_t lo;
- /**< Second 4 flexible bytes */
- };
- uint32_t hi;
- /**< First 4 flexible bytes or FD ID, dependent
- * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
- */
- } fdir; /**< Filter identifier if FDIR enabled */
- struct rte_mbuf_sched sched;
- /**< Hierarchical scheduler : 8 bytes */
- struct {
- uint32_t reserved1;
- uint16_t reserved2;
- uint16_t txq;
- /**< The event eth Tx adapter uses this field
- * to store Tx queue id.
- * @see rte_event_eth_tx_adapter_txq_set()
- */
- } txadapter; /**< Eventdev ethdev Tx adapter */
- uint32_t usr;
- /**< User defined tags. See rte_distributor_process() */
- } hash; /**< hash information */
- };
+ union {
+ struct {
+ uint16_t hash;
+ uint16_t id;
+ };
+ /**< Second 4 flexible bytes */
+ uint32_t lo;
+ };
+ /**< First 4 flexible bytes or FD ID, dependent
+ * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+ */
+ uint32_t hi;
+ } fdir; /**< Filter identifier if FDIR enabled */
+ struct rte_mbuf_sched sched;
+ /**< Hierarchical scheduler : 8 bytes */
+ struct {
+ uint32_t reserved1;
+ uint16_t reserved2;
+ /**< The event eth Tx adapter uses this field
+ * to store Tx queue id.
+ * @see rte_event_eth_tx_adapter_txq_set()
+ */
+ uint16_t txq;
+ } txadapter; /**< Eventdev ethdev Tx adapter */
+ /**< User defined tags. See rte_distributor_process() */
+ uint32_t usr;
+ } hash; /**< hash information */
+ };
- /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
- uint16_t vlan_tci_outer;
+ /** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+ uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
+ uint16_t buf_len; /**< Length of segment buffer. */
+ };
+ };
struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
/* second cache line - fields only used in slow path or on TX */
- alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
#if RTE_IOVA_IN_MBUF
/**
* Next segment of scattered packet. Must be NULL in the last
* segment or in case of non-segmented packet.
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
struct rte_mbuf *next;
#else
/**
* Reserved for dynamic fields
* when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
*/
+ alignas(RTE_CACHE_LINE_MIN_SIZE)
uint64_t dynfield2;
#endif
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
uint64_t tx_offload; /**< combined for easy fetch */
__extension__
struct {
- uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
/**< L2 (MAC) Header Length for non-tunneling pkt.
* Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
*/
- uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+ uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
/**< L3 (IP) Header Length. */
- uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+ uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
/**< L4 (TCP/UDP) Header Length. */
- uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+ uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
/**< TCP TSO segment size */
-
+ uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
/*
* Fields for Tx offloading of tunnels.
* These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
* Applications are expected to set appropriate tunnel
* offload flags when they fill in these fields.
*/
- uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
/**< Outer L3 (IP) Hdr Length. */
- uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+ uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
/**< Outer L2 (MAC) Hdr Length. */
+ uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
};
--
1.8.3.1
^ permalink raw reply [relevance 2%]
* [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries
@ 2024-04-02 20:08 3% ` Tyler Retzlaff
2024-04-02 20:08 2% ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-03 17:53 3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
` (2 subsequent siblings)
4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:08 UTC (permalink / raw)
To: dev
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb, Tyler Retzlaff
As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.
RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.
For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.
Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.
Note: diff is easier viewed with -b due to additional nesting from
unions / structs that have been introduced.
v9:
* provide narrowest possible libabigail.abignore to suppress
removal of fields that were agreed are not actual abi changes.
v8:
* rx_descriptor_fields1 array is now constexpr sized to
24 / sizeof(void *) so that the array encompasses fields
accessed via the array.
* add a comment to rx_descriptor_fields1 array site noting
that void * type of elements is retained for compatibility
with existing drivers.
* clean up comments of fields in rte_mbuf to be before the
field they apply to instead of after.
* duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
conditional compile for first field of cacheline 1 instead of
once before conditional compile block.
v7:
* complete re-write of series, previous versions not noted. all
reviewed-by and acked-by tags (if any) were removed.
Tyler Retzlaff (4):
net/i40e: use inline prefetch function
mbuf: remove rte marker fields
security: remove rte marker fields
cryptodev: remove rte marker fields
devtools/libabigail.abignore | 6 +
doc/guides/rel_notes/release_24_03.rst | 8 ++
drivers/net/i40e/i40e_rxtx_vec_avx512.c | 2 +-
lib/cryptodev/cryptodev_pmd.h | 5 +-
lib/mbuf/rte_mbuf.h | 4 +-
lib/mbuf/rte_mbuf_core.h | 200 +++++++++++++++++---------------
lib/security/rte_security_driver.h | 5 +-
7 files changed, 127 insertions(+), 103 deletions(-)
--
1.8.3.1
^ permalink raw reply [relevance 3%]
* [PATCH v5 6/8] net/tap: rewrite the RSS BPF program
@ 2024-04-02 17:12 2% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-02 17:12 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Rewrite the BPF program used to do queue based RSS.
Important changes:
- uses newer BPF map format BTF
- accepts key as parameter rather than constant default
- can do L3 or L4 hashing
- supports IPv4 options
- supports IPv6 extension headers
- restructured for readability
The usage of BPF is different as well:
- the incoming configuration is looked up based on
class parameters rather than patching the BPF.
- the resulting queue is placed in skb rather
than requiring a second pass through classifier step.
Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
.gitignore | 3 -
drivers/net/tap/bpf/Makefile | 19 --
drivers/net/tap/bpf/README | 38 ++++
drivers/net/tap/bpf/bpf_api.h | 276 --------------------------
drivers/net/tap/bpf/bpf_elf.h | 53 -----
| 85 --------
drivers/net/tap/bpf/meson.build | 81 ++++++++
drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
| 264 ++++++++++++++++++++++++
9 files changed, 383 insertions(+), 691 deletions(-)
delete mode 100644 drivers/net/tap/bpf/Makefile
create mode 100644 drivers/net/tap/bpf/README
delete mode 100644 drivers/net/tap/bpf/bpf_api.h
delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
create mode 100644 drivers/net/tap/bpf/meson.build
delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
create mode 100644 drivers/net/tap/bpf/tap_rss.c
diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
# ignore python bytecode files
*.pyc
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
# DTS results
dts/output
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
- rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
- $(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
- llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
- python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+ Some older versions of Ubuntu do not have working bpftool package.
+ Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+ This skeleton header is an large byte array which contains the
+ BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
- LIBBPF_PIN_NONE,
- /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
- LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X) #X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused __attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X) __builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X) __builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X) __constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X) __constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X) __constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X) __constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__ __attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME) \
- __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY) \
- __section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry \
- __section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry \
- __section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry \
- __section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license \
- __section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps \
- __section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME) \
- char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT -1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...) \
- (* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...) \
- __BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
- const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...) \
- __extension__ ({ \
- char ____fmt[] = fmt; \
- trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
- })
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
- uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
- uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
- void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
- const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
- uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
- const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
- uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
- uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
- uint64_t index, const void *data, uint32_t size) =
- (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
- uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
- struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
- const struct bpf_tunnel_key *from, uint32_t size,
- uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
- void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
- const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n) __builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n) __builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
- asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE "license"
-#define ELF_SECTION_MAPS "maps"
-#define ELF_SECTION_PROG "prog"
-#define ELF_SECTION_CLASSIFIER "classifier"
-#define ELF_SECTION_ACTION "action"
-
-#define ELF_MAX_MAPS 64
-#define ELF_MAX_LICENSE_LEN 128
-
-/* Object pinning settings */
-#define PIN_NONE 0
-#define PIN_OBJECT_NS 1
-#define PIN_GLOBAL_NS 2
-
-/* ELF map definition */
-struct bpf_elf_map {
- __u32 type;
- __u32 size_key;
- __u32 size_value;
- __u32 max_elem;
- __u32 flags;
- __u32 id;
- __u32 pinning;
- __u32 inner_id;
- __u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
- struct ____btf_map_##name { \
- type_key key; \
- type_val value; \
- }; \
- struct ____btf_map_##name \
- __attribute__ ((section(".maps." #name), used)) \
- ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
- """Get sections of interest from ELF"""
- result = []
- parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
- for name, tag in parts:
- section = elffile.get_section_by_name(name)
- if section:
- insns = struct.iter_unpack('<BBhL', section.data())
- result.append([tag, insns])
- return result
-
-
-def dump_section(name, insns, out):
- """Dump the array of BPF instructions"""
- print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
- for bpf in insns:
- code = bpf[0]
- src = bpf[1] >> 4
- dst = bpf[1] & 0xf
- off = bpf[2]
- imm = bpf[3]
- print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
- file=out)
- print('};', file=out)
-
-
-def parse_args():
- """Parse command line arguments"""
- parser = argparse.ArgumentParser()
- parser.add_argument('-s',
- '--source',
- type=str,
- help="original source file")
- parser.add_argument('-o', '--out', type=str, help="output C file path")
- parser.add_argument("file",
- nargs='+',
- help="object file path or '-' for stdin")
- return parser.parse_args()
-
-
-def open_input(path):
- """Open the file or stdin"""
- if path == "-":
- temp = TemporaryFile()
- temp.write(sys.stdin.buffer.read())
- return temp
- return open(path, 'rb')
-
-
-def write_header(out, source):
- """Write file intro header"""
- print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
- if source:
- print(f' * Auto-generated from {source}', file=out)
- print(" * This not the original source file. Do NOT edit it.", file=out)
- print(" */\n", file=out)
-
-
-def main():
- '''program main function'''
- args = parse_args()
-
- with open(args.out, 'w',
- encoding="utf-8") if args.out else sys.stdout as out:
- write_header(out, args.source)
- for path in args.file:
- elffile = ELFFile(open_input(path))
- sections = load_sections(elffile)
- for name, insns in sections:
- dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
- main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+ message('net/tap: no RSS support missing libbpf')
+ subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+ message('net/tap: no RSS support missing bpftool')
+ subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+ clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+ check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+ message('net/tap: no RSS support missing clang BPF')
+ subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+ '-O2',
+ '-Wall',
+ '-Wextra',
+ '-target',
+ 'bpf',
+ '-g',
+ '-c',
+]
+
+bpf_o_cmd = [
+ clang,
+ clang_flags,
+ '-idirafter',
+ libbpf_include_dir,
+ '-idirafter',
+ march_include_dir,
+ '@INPUT@',
+ '-o',
+ '@OUTPUT@'
+]
+
+skel_h_cmd = [
+ bpftool,
+ 'gen',
+ 'skeleton',
+ '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+ 'tap_rss.bpf.o',
+ input: 'tap_rss.c',
+ output: 'tap_rss.o',
+ command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+ 'tap_rss.skel.h',
+ input: tap_rss_o,
+ output: 'tap_rss.skel.h',
+ command: skel_h_cmd,
+ capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
- (((b) & 0xff) << 16) | \
- (((c) & 0xff) << 8) | \
- ((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
- ((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET 0x7cafe800
-#define PIN_GLOBAL_NS 2
-
-#define KEY_IDX 0
-#define BPF_MAP_ID_KEY 1
-
-struct vlan_hdr {
- __be16 proto;
- __be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
- .type = BPF_MAP_TYPE_HASH,
- .id = BPF_MAP_ID_KEY,
- .size_key = sizeof(__u32),
- .size_value = sizeof(struct rss_key),
- .max_elem = 256,
- .pinning = PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
- __u32 queue = skb->cb[1];
- /* queue is set by tap_flow_bpf_cls_q() before load */
- volatile __u32 q = 0xdeadbeef;
- __u32 match_queue = QUEUE_OFFSET + q;
-
- /* printt("match_q$i() queue = %d\n", queue); */
-
- if (queue != match_queue)
- return TC_ACT_OK;
-
- /* queue match */
- skb->cb[1] = 0;
- return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
- __u32 src_addr;
- __u32 dst_addr;
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
- __u8 src_addr[16];
- __u8 dst_addr[16];
- __u16 dport;
- __u16 sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
- 0xd1, 0x81, 0xc6, 0x2c,
- 0xf7, 0xf4, 0xdb, 0x5b,
- 0x19, 0x83, 0xa2, 0xfc,
- 0x94, 0x3e, 0x1a, 0xdb,
- 0xd9, 0x38, 0x9e, 0x6b,
- 0xd1, 0x03, 0x9c, 0x2c,
- 0xa7, 0x44, 0x99, 0xad,
- 0x59, 0x3d, 0x56, 0xd9,
- 0xf3, 0x25, 0x3c, 0x06,
- 0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32 __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
- __u8 input_len)
-{
- __u32 i, j, hash = 0;
-#pragma unroll
- for (j = 0; j < input_len; j++) {
-#pragma unroll
- for (i = 0; i < 32; i++) {
- if (input_tuple[j] & (1U << (31 - i))) {
- hash ^= ((const __u32 *)def_rss_key)[j] << i |
- (__u32)((uint64_t)
- (((const __u32 *)def_rss_key)[j + 1])
- >> (32 - i));
- }
- }
- }
- return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
- void *data_end = (void *)(long)skb->data_end;
- void *data = (void *)(long)skb->data;
- __u16 proto = (__u16)skb->protocol;
- __u32 key_idx = 0xdeadbeef;
- __u32 hash;
- struct rss_key *rsskey;
- __u64 off = ETH_HLEN;
- int j;
- __u8 *key = 0;
- __u32 len;
- __u32 queue = 0;
- bool mf = 0;
- __u16 frag_off = 0;
-
- rsskey = map_lookup_elem(&map_keys, &key_idx);
- if (!rsskey) {
- printt("hash(): rss key is not configured\n");
- return TC_ACT_OK;
- }
- key = (__u8 *)rsskey->key;
-
- /* Get correct proto for 802.1ad */
- if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
- if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
- sizeof(proto) > data_end)
- return TC_ACT_OK;
- proto = *(__u16 *)(data + ETH_ALEN * 2 +
- sizeof(struct vlan_hdr));
- off += sizeof(struct vlan_hdr);
- }
-
- if (proto == htons(ETH_P_IP)) {
- if (data + off + sizeof(struct iphdr) + sizeof(__u32)
- > data_end)
- return TC_ACT_OK;
-
- __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
- __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
- __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
- __u8 *src_dst_port = data + off + sizeof(struct iphdr);
- struct ipv4_l3_l4_tuple v4_tuple = {
- .src_addr = IPv4(*(src_dst_addr + 0),
- *(src_dst_addr + 1),
- *(src_dst_addr + 2),
- *(src_dst_addr + 3)),
- .dst_addr = IPv4(*(src_dst_addr + 4),
- *(src_dst_addr + 5),
- *(src_dst_addr + 6),
- *(src_dst_addr + 7)),
- .sport = 0,
- .dport = 0,
- };
- /** Fetch the L4-payer port numbers only in-case of TCP/UDP
- ** and also if the packet is not fragmented. Since fragmented
- ** chunks do not have L4 TCP/UDP header.
- **/
- if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
- frag_off = PORT(*(frag_off_addr + 0),
- *(frag_off_addr + 1));
- mf = frag_off & 0x2000;
- frag_off = frag_off & 0x1fff;
- if (mf == 0 && frag_off == 0) {
- v4_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v4_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- }
- }
- __u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
- } else if (proto == htons(ETH_P_IPV6)) {
- if (data + off + sizeof(struct ipv6hdr) +
- sizeof(__u32) > data_end)
- return TC_ACT_OK;
- __u8 *src_dst_addr = data + off +
- offsetof(struct ipv6hdr, saddr);
- __u8 *src_dst_port = data + off +
- sizeof(struct ipv6hdr);
- __u8 *next_hdr = data + off +
- offsetof(struct ipv6hdr, nexthdr);
-
- struct ipv6_l3_l4_tuple v6_tuple;
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.src_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + j));
- for (j = 0; j < 4; j++)
- *((uint32_t *)&v6_tuple.dst_addr + j) =
- __builtin_bswap32(*((uint32_t *)
- src_dst_addr + 4 + j));
-
- /** Fetch the L4 header port-numbers only if next-header
- * is TCP/UDP **/
- if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
- v6_tuple.sport = PORT(*(src_dst_port + 0),
- *(src_dst_port + 1));
- v6_tuple.dport = PORT(*(src_dst_port + 2),
- *(src_dst_port + 3));
- } else {
- v6_tuple.sport = 0;
- v6_tuple.dport = 0;
- }
-
- __u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
- if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
- input_len--;
- hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
- } else {
- return TC_ACT_PIPE;
- }
-
- queue = rsskey->queues[(hash % rsskey->nb_queues) &
- (TAP_MAX_QUEUES - 1)];
- skb->cb[1] = QUEUE_OFFSET + queue;
- /* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
- return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L) \
- __section(#L) int \
- L ## _hash(struct __sk_buff *skb) \
- { \
- return rss_ ## L (skb); \
- }
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
--git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct rss_key));
+ __uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF 0x2000 /** IP header Flags **/
+#define IP_OFFSET 0x1FFF /** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+ __u32 i, j, hash = 0;
+
+#pragma unroll
+ for (j = 0; j < input_len; j++) {
+#pragma unroll
+ for (i = 0; i < 32; i++) {
+ if (input_tuple[j] & (1U << (31 - i)))
+ hash ^= key[j] << i | key[j + 1] >> (32 - i);
+ }
+ }
+ return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct iphdr iph;
+ __u32 off = 0;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+ return 0; /* no IP header present */
+
+ struct {
+ __u32 src_addr;
+ __u32 dst_addr;
+ __u16 dport;
+ __u16 sport;
+ } v4_tuple = {
+ .src_addr = bpf_ntohl(iph.saddr),
+ .dst_addr = bpf_ntohl(iph.daddr),
+ };
+
+ /* If only calculating L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+ /* If packet is fragmented then no L4 hash is possible */
+ if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+ return 0;
+
+ /* Do RSS on UDP or TCP protocols */
+ if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ off += iph.ihl * 4;
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0; /* TCP or UDP header missing */
+
+ v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+ return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+ }
+
+ /* Other protocol */
+ return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+ struct ext_hdr {
+ __u8 next_hdr;
+ __u8 len;
+ } xh;
+ unsigned int i;
+
+ *frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+ for (i = 0; i < MAX_EXT_HDRS; i++) {
+ switch (proto) {
+ case IPPROTO_HOPOPTS:
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += (xh.len + 1) * 8;
+ proto = xh.next_hdr;
+ break;
+ case IPPROTO_FRAGMENT:
+ if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+ BPF_HDR_START_NET))
+ return -1;
+
+ *off += 8;
+ proto = xh.next_hdr;
+ *frag = 1;
+ return proto; /* this is always the last ext hdr */
+ default:
+ return proto;
+ }
+ }
+
+ /* too many extension headers give up */
+ return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+ struct {
+ __u32 src_addr[4];
+ __u32 dst_addr[4];
+ __u16 dport;
+ __u16 sport;
+ } v6_tuple = { };
+ struct ipv6hdr ip6h;
+ __u32 off = 0, j;
+ int proto, frag;
+
+ if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+ return 0; /* missing IPv6 header */
+
+#pragma unroll
+ for (j = 0; j < 4; j++) {
+ v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+ v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+ }
+
+ /* If only doing L3 hash, do it now */
+ if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+ /* Skip extension headers if present */
+ off += sizeof(ip6h);
+ proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+ if (proto < 0)
+ return 0;
+
+ /* If packet is a fragment then no L4 hash is possible */
+ if (frag)
+ return 0;
+
+ /* Do RSS on UDP or TCP */
+ if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+ __u16 src_dst_port[2];
+
+ if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+ BPF_HDR_START_NET))
+ return 0;
+
+ v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+ v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+ return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+ }
+
+ return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+ const __u32 *key = (const __u32 *)rsskey->key;
+
+ if (skb->protocol == bpf_htons(ETH_P_IP))
+ return parse_ipv4(skb, rsskey->hash_fields, key);
+ else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+ return parse_ipv6(skb, rsskey->hash_fields, key);
+ else
+ return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32 __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+ return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+ const struct rss_key *rsskey;
+ __u32 mark = skb->mark;
+ __u32 hash;
+
+ /* Lookup RSS configuration for that BPF class */
+ rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+ if (rsskey == NULL)
+ return TC_ACT_OK;
+
+ hash = calculate_rss_hash(skb, rsskey);
+ if (!hash)
+ return TC_ACT_OK;
+
+ /* Fold hash to the number of queues configured */
+ skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+ return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: [PATCH] version: 24.07-rc0
2024-04-02 8:52 0% ` Thomas Monjalon
@ 2024-04-02 9:25 0% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-04-02 9:25 UTC (permalink / raw)
To: David Marchand; +Cc: dev, Aaron Conole, Michael Santana, Thomas Monjalon
On Tue, Apr 2, 2024 at 10:52 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/03/2024 18:54, David Marchand:
> > Start a new release cycle with empty release notes.
> > Bump version and ABI minor.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
Applied, thanks.
--
David Marchand
^ permalink raw reply [relevance 0%]
* Re: [PATCH] version: 24.07-rc0
2024-03-30 17:54 18% [PATCH] version: 24.07-rc0 David Marchand
@ 2024-04-02 8:52 0% ` Thomas Monjalon
2024-04-02 9:25 0% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-04-02 8:52 UTC (permalink / raw)
To: David Marchand; +Cc: dev, Aaron Conole, Michael Santana
30/03/2024 18:54, David Marchand:
> Start a new release cycle with empty release notes.
> Bump version and ABI minor.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
^ permalink raw reply [relevance 0%]
* Re: The effect of inlining
2024-03-29 13:42 3% ` The effect of inlining Morten Brørup
2024-03-29 20:26 0% ` Tyler Retzlaff
@ 2024-04-01 15:20 3% ` Mattias Rönnblom
2024-04-03 16:01 3% ` Morten Brørup
1 sibling, 1 reply; 200+ results
From: Mattias Rönnblom @ 2024-04-01 15:20 UTC (permalink / raw)
To: Morten Brørup, Maxime Coquelin, Stephen Hemminger, Andrey Ignatov
Cc: dev, Chenbo Xia, Wei Shen, techboard
On 2024-03-29 14:42, Morten Brørup wrote:
> +CC techboard
>
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, 29 March 2024 14.05
>>
>> Hi Stephen,
>>
>> On 3/29/24 03:53, Stephen Hemminger wrote:
>>> On Thu, 28 Mar 2024 17:10:42 -0700
>>> Andrey Ignatov <rdna@apple.com> wrote:
>>>
>>>>>
>>>>> You don't need always inline, the compiler will do it anyway.
>>>>
>>>> I can remove it in v2, but it's not completely obvious to me how is
>> it
>>>> decided when to specify it explicitly and when not?
>>>>
>>>> I see plenty of __rte_always_inline in this file:
>>>>
>>>> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
>>>> lib/vhost/virtio_net.c:66
>>>
>>>
>>> Cargo cult really.
>>>
>>
>> Cargo cult... really?
>>
>> Well, I just did a quick test by comparing IO forwarding with testpmd
>> between main branch and with adding a patch that removes all the
>> inline/noinline in lib/vhost/virtio_net.c [0].
>>
>> main branch: 14.63Mpps
>> main branch - inline/noinline: 10.24Mpps
>
> Thank you for testing this, Maxime. Very interesting!
>
> It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.
>
I think you are mixing two different (but related) things here.
1) marking functions with the inline family of keywords/attributes
2) keeping function definitions in header files
1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the
API (i.e., source-level compatibility).
2) *allows* for function inlining even in non-LTO builds, but doesn't
force it.
If you don't believe 2) makes a difference performance-wise, it follows
that you also don't believe LTO makes much of a difference. Both have
the same effect: allowing the compiler to reason over a larger chunk of
your program.
Allowing the compiler to inline small, often-called functions is crucial
for performance, in my experience. If the target symbol tend to be in a
shared object, the difference is even larger. It's also quite common
that you see no effect of LTO (other than a reduction of code footprint).
As LTO becomes more practical to use, 2) loses much of its appeal.
If PGO ever becomes practical to use, maybe 1) will as well.
> I think this test proves that the sum of many small (negligible) performance differences it not negligible!
>
>>
>> Andrey, thanks for the patch, I'll have a look at it next week.
>>
>> Maxime
>>
>> [0]: https://pastebin.com/72P2npZ0
>
^ permalink raw reply [relevance 3%]
* [PATCH] version: 24.07-rc0
@ 2024-03-30 17:54 18% David Marchand
2024-04-02 8:52 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-03-30 17:54 UTC (permalink / raw)
To: dev; +Cc: thomas, Aaron Conole, Michael Santana
Start a new release cycle with empty release notes.
Bump version and ABI minor.
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
.github/workflows/build.yml | 2 +-
ABI_VERSION | 2 +-
VERSION | 2 +-
doc/guides/rel_notes/index.rst | 1 +
doc/guides/rel_notes/release_24_03.rst | 110 --------------------
doc/guides/rel_notes/release_24_07.rst | 138 +++++++++++++++++++++++++
6 files changed, 142 insertions(+), 113 deletions(-)
create mode 100644 doc/guides/rel_notes/release_24_07.rst
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 2c308d5e9d..dbf25626d4 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -27,7 +27,7 @@ jobs:
MINGW: ${{ matrix.config.cross == 'mingw' }}
MINI: ${{ matrix.config.mini != '' }}
PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
- REF_GIT_TAG: v23.11
+ REF_GIT_TAG: v24.03
RISCV64: ${{ matrix.config.cross == 'riscv64' }}
RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
STDATOMIC: ${{ contains(matrix.config.checks, 'stdatomic') }}
diff --git a/ABI_VERSION b/ABI_VERSION
index 0dad123924..9dc0ade502 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-24.1
+24.2
diff --git a/VERSION b/VERSION
index 58dfef16ef..2081979127 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-24.03.0
+24.07.0-rc0
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 88f2b30b03..77a92b308f 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
:maxdepth: 1
:numbered:
+ release_24_07
release_24_03
release_23_11
release_23_07
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 8e7ad8f99f..013c12f801 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -6,55 +6,9 @@
DPDK Release 24.03
==================
-.. **Read this first.**
-
- The text in the sections below explains how to update the release notes.
-
- Use proper spelling, capitalization and punctuation in all sections.
-
- Variable and config names should be quoted as fixed width text:
- ``LIKE_THIS``.
-
- Build the docs and view the output file to ensure the changes are correct::
-
- ninja -C build doc
- xdg-open build/doc/guides/html/rel_notes/release_24_03.html
-
-
New Features
------------
-.. This section should contain new features added in this release.
- Sample format:
-
- * **Add a title in the past tense with a full stop.**
-
- Add a short 1-2 sentence description in the past tense.
- The description should be enough to allow someone scanning
- the release notes to understand the new feature.
-
- If the feature adds a lot of sub-features you can use a bullet list
- like this:
-
- * Added feature foo to do something.
- * Enhanced feature bar to do something else.
-
- Refer to the previous release notes for examples.
-
- Suggested order in release notes items:
- * Core libs (EAL, mempool, ring, mbuf, buses)
- * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
- - ethdev (lib, PMDs)
- - cryptodev (lib, PMDs)
- - eventdev (lib, PMDs)
- - etc
- * Other libs
- * Apps, Examples, Tools (if significant)
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
* **Added HiSilicon UACCE bus support.**
Added UACCE (Unified/User-space-access-intended Accelerator Framework) bus
@@ -200,15 +154,6 @@ New Features
Removed Items
-------------
-.. This section should contain removed items in this release. Sample format:
-
- * Add a short 1-2 sentence description of the removed item
- in the past tense.
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
* log: Removed the statically defined logtypes that were used internally by DPDK.
All code should be using the dynamic logtypes (see ``RTE_LOG_REGISTER()``).
The application reserved statically defined logtypes ``RTE_LOGTYPE_USER1..RTE_LOGTYPE_USER8``
@@ -220,18 +165,6 @@ Removed Items
API Changes
-----------
-.. This section should contain API changes. Sample format:
-
- * sample: Add a short 1-2 sentence description of the API change
- which was announced in the previous releases and made in this release.
- Start with a scope label like "ethdev:".
- Use fixed width quotes for ``function_names`` or ``struct_names``.
- Use the past tense.
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
* eal: Removed ``typeof(type)`` from the expansion of ``RTE_DEFINE_PER_LCORE``
and ``RTE_DECLARE_PER_LCORE`` macros aligning them with their intended design.
If use with an expression is desired applications can adapt by supplying
@@ -249,55 +182,12 @@ API Changes
ABI Changes
-----------
-.. This section should contain ABI changes. Sample format:
-
- * sample: Add a short 1-2 sentence description of the ABI change
- which was announced in the previous releases and made in this release.
- Start with a scope label like "ethdev:".
- Use fixed width quotes for ``function_names`` or ``struct_names``.
- Use the past tense.
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
* No ABI change that would break compatibility with 23.11.
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
- * **Add title in present tense with full stop.**
-
- Add a short 1-2 sentence description of the known issue
- in the present tense. Add information on any known workarounds.
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
-
Tested Platforms
----------------
-.. This section should contain a list of platforms that were tested
- with this release.
-
- The format is:
-
- * <vendor> platform with <vendor> <type of devices> combinations
-
- * List of CPU
- * List of OS
- * List of devices
- * Other relevant details...
-
- This section is a comment. Do not overwrite or remove it.
- Also, make sure to start the actual text at the margin.
- =======================================================
-
* AMD platforms
* CPU
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
new file mode 100644
index 0000000000..a69f24cf99
--- /dev/null
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright 2024 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 24.07
+==================
+
+.. **Read this first.**
+
+ The text in the sections below explains how to update the release notes.
+
+ Use proper spelling, capitalization and punctuation in all sections.
+
+ Variable and config names should be quoted as fixed width text:
+ ``LIKE_THIS``.
+
+ Build the docs and view the output file to ensure the changes are correct::
+
+ ninja -C build doc
+ xdg-open build/doc/guides/html/rel_notes/release_24_07.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+ Sample format:
+
+ * **Add a title in the past tense with a full stop.**
+
+ Add a short 1-2 sentence description in the past tense.
+ The description should be enough to allow someone scanning
+ the release notes to understand the new feature.
+
+ If the feature adds a lot of sub-features you can use a bullet list
+ like this:
+
+ * Added feature foo to do something.
+ * Enhanced feature bar to do something else.
+
+ Refer to the previous release notes for examples.
+
+ Suggested order in release notes items:
+ * Core libs (EAL, mempool, ring, mbuf, buses)
+ * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+ - ethdev (lib, PMDs)
+ - cryptodev (lib, PMDs)
+ - eventdev (lib, PMDs)
+ - etc
+ * Other libs
+ * Apps, Examples, Tools (if significant)
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+ * Add a short 1-2 sentence description of the removed item
+ in the past tense.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+ * sample: Add a short 1-2 sentence description of the API change
+ which was announced in the previous releases and made in this release.
+ Start with a scope label like "ethdev:".
+ Use fixed width quotes for ``function_names`` or ``struct_names``.
+ Use the past tense.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+ * sample: Add a short 1-2 sentence description of the ABI change
+ which was announced in the previous releases and made in this release.
+ Start with a scope label like "ethdev:".
+ Use fixed width quotes for ``function_names`` or ``struct_names``.
+ Use the past tense.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+* No ABI change that would break compatibility with 23.11.
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+ * **Add title in present tense with full stop.**
+
+ Add a short 1-2 sentence description of the known issue
+ in the present tense. Add information on any known workarounds.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+ with this release.
+
+ The format is:
+
+ * <vendor> platform with <vendor> <type of devices> combinations
+
+ * List of CPU
+ * List of OS
+ * List of devices
+ * Other relevant details...
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
--
2.44.0
^ permalink raw reply [relevance 18%]
* Re: The effect of inlining
2024-03-29 13:42 3% ` The effect of inlining Morten Brørup
@ 2024-03-29 20:26 0% ` Tyler Retzlaff
2024-04-01 15:20 3% ` Mattias Rönnblom
1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-29 20:26 UTC (permalink / raw)
To: Morten Brørup
Cc: Maxime Coquelin, Stephen Hemminger, Andrey Ignatov, dev,
Chenbo Xia, Wei Shen, techboard
On Fri, Mar 29, 2024 at 02:42:49PM +0100, Morten Brørup wrote:
> +CC techboard
>
> > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > Sent: Friday, 29 March 2024 14.05
> >
> > Hi Stephen,
> >
> > On 3/29/24 03:53, Stephen Hemminger wrote:
> > > On Thu, 28 Mar 2024 17:10:42 -0700
> > > Andrey Ignatov <rdna@apple.com> wrote:
> > >
> > >>>
> > >>> You don't need always inline, the compiler will do it anyway.
> > >>
> > >> I can remove it in v2, but it's not completely obvious to me how is
> > it
> > >> decided when to specify it explicitly and when not?
> > >>
> > >> I see plenty of __rte_always_inline in this file:
> > >>
> > >> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> > >> lib/vhost/virtio_net.c:66
> > >
> > >
> > > Cargo cult really.
> > >
> >
> > Cargo cult... really?
> >
> > Well, I just did a quick test by comparing IO forwarding with testpmd
> > between main branch and with adding a patch that removes all the
> > inline/noinline in lib/vhost/virtio_net.c [0].
> >
> > main branch: 14.63Mpps
> > main branch - inline/noinline: 10.24Mpps
>
> Thank you for testing this, Maxime. Very interesting!
>
> It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.
removing inline functions probably has an even more profound negative
impact when using dynamic linking. for all the value of msvc's dll
scoped security features they do have overheads per-call that can't be
wished away i imagine equivalents in gcc are the same.
>
> I think this test proves that the sum of many small (negligible) performance differences it not negligible!
sure looks that way, though i think there is some distinction to be made
between inline and *forced* inline.
force inline may be losing us some opportunity for the compiler to
optimize better than is obvious to us.
>
> >
> > Andrey, thanks for the patch, I'll have a look at it next week.
> >
> > Maxime
> >
> > [0]: https://pastebin.com/72P2npZ0
>
^ permalink raw reply [relevance 0%]
* The effect of inlining
@ 2024-03-29 13:42 3% ` Morten Brørup
2024-03-29 20:26 0% ` Tyler Retzlaff
2024-04-01 15:20 3% ` Mattias Rönnblom
0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2024-03-29 13:42 UTC (permalink / raw)
To: Maxime Coquelin, Stephen Hemminger, Andrey Ignatov
Cc: dev, Chenbo Xia, Wei Shen, techboard
+CC techboard
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, 29 March 2024 14.05
>
> Hi Stephen,
>
> On 3/29/24 03:53, Stephen Hemminger wrote:
> > On Thu, 28 Mar 2024 17:10:42 -0700
> > Andrey Ignatov <rdna@apple.com> wrote:
> >
> >>>
> >>> You don't need always inline, the compiler will do it anyway.
> >>
> >> I can remove it in v2, but it's not completely obvious to me how is
> it
> >> decided when to specify it explicitly and when not?
> >>
> >> I see plenty of __rte_always_inline in this file:
> >>
> >> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> >> lib/vhost/virtio_net.c:66
> >
> >
> > Cargo cult really.
> >
>
> Cargo cult... really?
>
> Well, I just did a quick test by comparing IO forwarding with testpmd
> between main branch and with adding a patch that removes all the
> inline/noinline in lib/vhost/virtio_net.c [0].
>
> main branch: 14.63Mpps
> main branch - inline/noinline: 10.24Mpps
Thank you for testing this, Maxime. Very interesting!
It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.
I think this test proves that the sum of many small (negligible) performance differences it not negligible!
>
> Andrey, thanks for the patch, I'll have a look at it next week.
>
> Maxime
>
> [0]: https://pastebin.com/72P2npZ0
^ permalink raw reply [relevance 3%]
* DPDK 24.03 released
@ 2024-03-28 21:46 3% Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-03-28 21:46 UTC (permalink / raw)
To: announce
A new major release is available:
https://fast.dpdk.org/rel/dpdk-24.03.tar.xz
This is the work we did during the last months:
987 commits from 154 authors
1334 files changed, 79260 insertions(+), 22824 deletions(-)
It is not planned to start a maintenance branch for 24.03.
This version is ABI-compatible with 23.11.
Below are some new features:
- argument parsing library
- dynamic logging standardized
- HiSilicon UACCE bus
- Tx queue query
- flow matching with random and field comparison
- flow action NAT64
- flow template table resizing
- more cleanups to prepare MSVC build
- more DTS tests and cleanups
More details in the release notes:
https://doc.dpdk.org/guides/rel_notes/release_24_03.html
There are 31 new contributors (including authors, reviewers and testers).
Welcome to Akshay Dorwat, Alan Elder, Bhuvan Mital, Brad Larson,
Christian Koue Muf, Chuanyu Xue, Emi Aoki, Fidel Castro, Flore Norceide,
Gavin Li, Holly Nichols, Jack Bond-Preston, Lewis Donzis, Liangxing Wang,
Luca Vizzarro, Masoumeh Farhadi Nia, Mykola Kostenok, Nicholas Pratte,
Nishikant Nayak, Oleksandr Kolomeiets, Parthakumar Roy, Qian Hao,
Shani Peretz, Shaowei Sun, Ting-Kai Ku, Tingting Liao, Tom Jones,
Vamsi Krishna Atluri, Venkat Kumar Ande, Vinh Tran,
and Wathsala Vithanage.
Below is the number of commits per employer (with authors count):
202 Marvell (26)
166 NVIDIA (23)
125 Intel (31)
80 networkplumber.org (1)
77 Corigine (6)
64 Red Hat (5)
56 Huawei (7)
52 Broadcom (6)
33 AMD (9)
32 Amazon (1)
27 Microsoft (4)
14 PANTHEON.tech (1)
14 Arm (5)
7 Google (2)
6 UNH (1)
...
A big thank to all courageous people who reviewed other's work.
Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
50 Akhil Goyal <gakhil@marvell.com>
44 Ferruh Yigit <ferruh.yigit@amd.com>
40 Chengwen Feng <fengchengwen@huawei.com>
36 Anoob Joseph <anoobj@marvell.com>
32 Morten Brørup <mb@smartsharesystems.com>
26 Tyler Retzlaff <roretzla@linux.microsoft.com>
21 Dariusz Sosnowski <dsosnowski@nvidia.com>
18 Ori Kam <orika@nvidia.com>
18 Bruce Richardson <bruce.richardson@intel.com>
The next challenge is to reduce open bugs drastically.
The next version will be 24.07 in July.
The new features for 24.07 can be submitted during the next 4 weeks:
http://core.dpdk.org/roadmap#dates
Please share your roadmap.
Don't forget to register for the webinar about DPDK in the cloud:
https://zoom.us/webinar/register/WN_IG21wHwlTEGTv3sAXqcoFg
Thanks everyone
^ permalink raw reply [relevance 3%]
* Minutes of Technical Board meeting 20-March-2024
@ 2024-03-28 2:19 3% Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-28 2:19 UTC (permalink / raw)
To: dev
Members Attending
=================
Aaron Conole
Bruce Richardson
Hemant Agrawal
Jerin Jacob
Kevin Traynor
Konstantin Ananyev
Maxime Coquelin
Morten Brørup
Stephen Hemminger (chair)
Thomas Monjalon
NOTE
====
The technical board meetings are on every second Wednesday at 3 pm UTC.
Meetings are public. DPDK community members are welcome to attend on Zoom:
https://zoom-lfx.platform.linuxfoundation.org/meeting/96459488340?
password=d808f1f6-0a28-4165-929e-5a5bcae7efeb
Agenda: https://annuel.framapad.org/p/r.0c3cc4d1e011214183872a98f6b5c7db
Minutes of previous meetings: http://core.dpdk.org/techboard/minutes
Next meeting will be on Wednesday 3-April-2024 at 3pm UTC, and will be
chaired by Thomas.
Agenda Items
============
1. Lcore variables (Mattias)
This patch series proposes an alternative for per-lcore variables.
Lots of code uses the pattern is per-lcore-data.
This solution is simple but uses lots of padding to handle cache lines;
and it is easy to overlook some cache patterns and get false sharing.
Recent example was hardware will pre-fetch across cache lines.
The per-lcore array model also doesn't handle non-EAL threads well.
One other issue is that accessing thread local storage variables
in another thread works but is undefined according to standards.
The proposal defines yet another allocator for per-thread data.
It is available early in startup other libraries can use use it.
Per-thread allocated storage can safely be used by unregistered threads.
Limitations: it does not completely solve the HW pre-fetch issue,
and it is not a true heap since there is no free or collection function.
Performance is about the same. All implementations have same access pattern.
An added benefit is that there is less chance of false sharing bugs.
Issues:
Valgrind will report leaks. It uses aligned_alloc() and that
function is different on Windows.
Questions:
Should it use hugepages?
startup issue before/after memory setup?
Or hint the OS?
2. 2024 Events update (Nathan)
Planning for North American DPDK summit in Montreal - week of 9 September.
Still pending governing board approval.
Discussions around an Asia-Pacific event in Bangkok Thailand.
This location was suggested as a neutral location (less visa issues).
Early stage discussions, still needs more work by governing board.
There has been large amount of interest in DPDK webinars in APAC.
Are there budget or visa issues with travel for Indian companies?
3. Code Challenge (Ben)
Asked Jim Zemlin about cross-project challenges; he agreed good idea but hard to pull off.
What next steps are needed to make this event work?
Proposal to have DPDK branded merch as part of this.
4. Changes to use markers (Tyler)
Using GCC zero sized arrays, used for alignment and reference and prefetch.
Not available in MSVC. Existing code has inconsistent usage of markers.
Proposed solutions (both maintain ABI):
Option 1: anonymous unions in C11, same API.
Option 2: break API, use existing cacheline prefetch functions.
Existing usage appears to be for prefetch, and hardware rearming.
A hybrid approach is to remove markers for the prefetch usage,
and use anonymous unions for the hardware rearm.
Need a way to prevent introduction of new usage of markers (checkpatch).
^ permalink raw reply [relevance 3%]
* [PATCH v5] graph: expose node context as pointers
@ 2024-03-27 9:14 4% ` Robin Jarry
2024-05-29 17:54 0% ` Nithin Dabilpuram
2024-06-18 12:33 4% ` David Marchand
0 siblings, 2 replies; 200+ results
From: Robin Jarry @ 2024-03-27 9:14 UTC (permalink / raw)
To: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan
Cc: Tyler Retzlaff
In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. And without
intermediate structures, casting to opaque pointers is hard without
violating strict aliasing rules.
Add an unnamed union to allow storing opaque pointers in the node
context. Unfortunately, aligning an unnamed union that contains an array
produces inconsistent results between C and C++. To preserve ABI/API
compatibility in both C and C++, move all fast-path area fields into an
unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
preserve existing alignment on architectures where cache lines are 128
bytes.
Add a static assert to ensure that the unnamed union is not larger than
the context array (RTE_NODE_CTX_SZ).
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
Notes:
v5:
* Helper functions to hide casting proved to be harder than expected.
Naive casting may even be impossible without breaking strict aliasing
rules. The only other option would be to use explicit memcpy calls.
* Unnamed union tentative again. As suggested by Tyler (thank you!),
using an intermediate unnamed struct to carry the alignment produces
consistent ABI in C and C++.
* Also, Tyler (thank you!) suggested that the fast path area alignment
size may be incorrect for architectures where the cache line is not 64
bytes. There will be a 64 bytes hole in the structure at the end of
the unnamed struct before the zero length next nodes array. Use
__rte_cache_min_aligned to preserve existing alignment.
v4:
* Replaced the unnamed union with helper inline functions.
v3:
* Added __extension__ to the unnamed struct inside the union.
* Fixed C++ header checks.
* Replaced alignas() with an explicit static_assert.
lib/graph/rte_graph_worker_common.h | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..84d4997bbbf6 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -12,7 +12,9 @@
* process, enqueue and move streams of objects to the next nodes.
*/
+#include <assert.h>
#include <stdalign.h>
+#include <stddef.h>
#include <rte_common.h>
#include <rte_cycles.h>
@@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
} dispatch;
};
/* Fast path area */
+ __extension__ struct __rte_cache_min_aligned {
#define RTE_NODE_CTX_SZ 16
- alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
- uint16_t size; /**< Total number of objects available. */
- uint16_t idx; /**< Number of objects used. */
- rte_graph_off_t off; /**< Offset of node in the graph reel. */
- uint64_t total_cycles; /**< Cycles spent in this node. */
- uint64_t total_calls; /**< Calls done to this node. */
- uint64_t total_objs; /**< Objects processed by this node. */
+ union {
+ uint8_t ctx[RTE_NODE_CTX_SZ];
+ __extension__ struct {
+ void *ctx_ptr;
+ void *ctx_ptr2;
+ };
+ }; /**< Node Context. */
+ uint16_t size; /**< Total number of objects available. */
+ uint16_t idx; /**< Number of objects used. */
+ rte_graph_off_t off; /**< Offset of node in the graph reel. */
+ uint64_t total_cycles; /**< Cycles spent in this node. */
+ uint64_t total_calls; /**< Calls done to this node. */
+ uint64_t total_objs; /**< Objects processed by this node. */
union {
void **objs; /**< Array of object pointers. */
uint64_t objs_u64;
@@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
rte_node_process_t process; /**< Process function. */
uint64_t process_u64;
};
+ };
alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
};
+static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, ctx) == RTE_NODE_CTX_SZ,
+ "rte_node context must be RTE_NODE_CTX_SZ bytes exactly");
+
/**
* @internal
*
--
2.44.0
^ permalink raw reply [relevance 4%]
* [PATCH v2 1/6] ethdev: support setting lanes
@ 2024-03-22 7:09 5% ` Dengdui Huang
0 siblings, 0 replies; 200+ results
From: Dengdui Huang @ 2024-03-22 7:09 UTC (permalink / raw)
To: dev
Cc: ferruh.yigit, aman.deep.singh, yuying.zhang, thomas,
andrew.rybchenko, damodharam.ammepalli, stephen, jerinjacobk,
ajit.khaparde, liuyonglong, fengchengwen, haijie1, lihuisong
Some speeds can be achieved with different number of lanes. For example,
100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps.
When use different lanes, the port cannot be up. This patch add support
setting lanes and report lanes.
In addition, add a device capability RTE_ETH_DEV_CAPA_SETTING_LANES
When the device does not support it, if a speed supports different
numbers of lanes, the application does not knowe which the lane number
are used by the device.
Signed-off-by: Dengdui Huang <huangdengdui@huawei.com>
---
doc/guides/rel_notes/release_24_03.rst | 6 +
drivers/net/bnxt/bnxt_ethdev.c | 3 +-
drivers/net/hns3/hns3_ethdev.c | 1 +
lib/ethdev/ethdev_linux_ethtool.c | 208 ++++++++++++-------------
lib/ethdev/ethdev_private.h | 4 +
lib/ethdev/ethdev_trace.h | 4 +-
lib/ethdev/meson.build | 2 +
lib/ethdev/rte_ethdev.c | 85 +++++++---
lib/ethdev/rte_ethdev.h | 75 ++++++---
lib/ethdev/version.map | 6 +
10 files changed, 250 insertions(+), 144 deletions(-)
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 7bd9ceab27..4621689c68 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -76,6 +76,9 @@ New Features
* Added a fath path function ``rte_eth_tx_queue_count``
to get the number of used descriptors of a Tx queue.
+* **Support setting lanes for ethdev.**
+ * Support setting lanes by extended ``RTE_ETH_LINK_SPEED_*``.
+
* **Added hash calculation of an encapsulated packet as done by the HW.**
Added function to calculate hash when doing tunnel encapsulation:
@@ -254,6 +257,9 @@ ABI Changes
* No ABI change that would break compatibility with 23.11.
+* ethdev: Convert a numerical speed to a bitmap flag with lanes:
+ The function ``rte_eth_speed_bitflag`` add lanes parameters.
+
Known Issues
------------
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ba31ae9286..e881a7f3cc 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -711,7 +711,8 @@ static int bnxt_update_phy_setting(struct bnxt *bp)
}
/* convert to speedbit flag */
- curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed, 1);
+ curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed,
+ RTE_ETH_LANES_UNKNOWN, 1);
/*
* Device is not obliged link down in certain scenarios, even
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index b10d1216d2..ecd3b2ef64 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -5969,6 +5969,7 @@ hns3_get_speed_fec_capa(struct rte_eth_fec_capa *speed_fec_capa,
for (i = 0; i < RTE_DIM(speed_fec_capa_tbl); i++) {
speed_bit =
rte_eth_speed_bitflag(speed_fec_capa_tbl[i].speed,
+ RTE_ETH_LANES_UNKNOWN,
RTE_ETH_LINK_FULL_DUPLEX);
if ((speed_capa & speed_bit) == 0)
continue;
diff --git a/lib/ethdev/ethdev_linux_ethtool.c b/lib/ethdev/ethdev_linux_ethtool.c
index e792204b01..6412845161 100644
--- a/lib/ethdev/ethdev_linux_ethtool.c
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -7,6 +7,10 @@
#include "rte_ethdev.h"
#include "ethdev_linux_ethtool.h"
+#define RTE_ETH_LINK_MODES_INDEX_SPEED 0
+#define RTE_ETH_LINK_MODES_INDEX_DUPLEX 1
+#define RTE_ETH_LINK_MODES_INDEX_LANES 2
+
/* Link modes sorted with index as defined in ethtool.
* Values are speed in Mbps with LSB indicating duplex.
*
@@ -15,123 +19,119 @@
* and allows to compile with new bits included even on an old kernel.
*
* The array below is built from bit definitions with this shell command:
- * sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*= *([0-9]*).*;'\
- * '[\4] = \2, /\* \1\2\3 *\/;p' /usr/include/linux/ethtool.h |
- * awk '/_Half_/{$3=$3+1","}1'
+ * sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([a-zA-Z]+)([0-9_]+)([0-9a-zA-Z_]*)
+ * .*= *([0-9]*).*;'\ '[\6] = {\2, 1, \4}, /\* \1\2\3\4\5 *\/;p'
+ * /usr/include/linux/ethtool.h | awk '/_Half_/{$4=0","}1' |
+ * awk '/, _}/{$5=1"},"}1' | awk '{sub(/_}/,"\}");}1'
*/
-static const uint32_t link_modes[] = {
- [0] = 11, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
- [1] = 10, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
- [2] = 101, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
- [3] = 100, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
- [4] = 1001, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
- [5] = 1000, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
- [12] = 10000, /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
- [15] = 2500, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
- [17] = 1000, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
- [18] = 10000, /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
- [19] = 10000, /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
- [20] = 10000, /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
- [21] = 20000, /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
- [22] = 20000, /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
- [23] = 40000, /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
- [24] = 40000, /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
- [25] = 40000, /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
- [26] = 40000, /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
- [27] = 56000, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
- [28] = 56000, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
- [29] = 56000, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
- [30] = 56000, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
- [31] = 25000, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
- [32] = 25000, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
- [33] = 25000, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
- [34] = 50000, /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
- [35] = 50000, /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
- [36] = 100000, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
- [37] = 100000, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
- [38] = 100000, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
- [39] = 100000, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
- [40] = 50000, /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
- [41] = 1000, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
- [42] = 10000, /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
- [43] = 10000, /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
- [44] = 10000, /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
- [45] = 10000, /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
- [46] = 10000, /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
- [47] = 2500, /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
- [48] = 5000, /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
- [52] = 50000, /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
- [53] = 50000, /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
- [54] = 50000, /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
- [55] = 50000, /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
- [56] = 50000, /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
- [57] = 100000, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
- [58] = 100000, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
- [59] = 100000, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
- [60] = 100000, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
- [61] = 100000, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
- [62] = 200000, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
- [63] = 200000, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
- [64] = 200000, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
- [65] = 200000, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
- [66] = 200000, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
- [67] = 100, /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
- [68] = 1000, /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
- [69] = 400000, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
- [70] = 400000, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
- [71] = 400000, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
- [72] = 400000, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
- [73] = 400000, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
- [75] = 100000, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
- [76] = 100000, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
- [77] = 100000, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
- [78] = 100000, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
- [79] = 100000, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
- [80] = 200000, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
- [81] = 200000, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
- [82] = 200000, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
- [83] = 200000, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
- [84] = 200000, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
- [85] = 400000, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
- [86] = 400000, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
- [87] = 400000, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
- [88] = 400000, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
- [89] = 400000, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
- [90] = 101, /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
- [91] = 100, /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
- [92] = 10, /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
- [93] = 800000, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
- [94] = 800000, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
- [95] = 800000, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
- [96] = 800000, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
- [97] = 800000, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
- [98] = 800000, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
- [99] = 10, /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
- [100] = 11, /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
- [101] = 11, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
+static const uint32_t link_modes[][3] = {
+ [0] = {10, 0, 1}, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+ [1] = {10, 1, 1}, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+ [2] = {100, 0, 1}, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+ [3] = {100, 1, 1}, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+ [4] = {1000, 0, 1}, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+ [5] = {1000, 1, 1}, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+ [12] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
+ [15] = {2500, 1, 1}, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+ [17] = {1000, 1, 1}, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+ [18] = {10000, 1, 4}, /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
+ [19] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
+ [20] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
+ [21] = {20000, 1, 2}, /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
+ [22] = {20000, 1, 2}, /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
+ [23] = {40000, 1, 4}, /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
+ [24] = {40000, 1, 4}, /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
+ [25] = {40000, 1, 4}, /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
+ [26] = {40000, 1, 4}, /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
+ [27] = {56000, 1, 4}, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+ [28] = {56000, 1, 4}, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+ [29] = {56000, 1, 4}, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+ [30] = {56000, 1, 4}, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+ [31] = {25000, 1, 1}, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+ [32] = {25000, 1, 1}, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+ [33] = {25000, 1, 1}, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+ [34] = {50000, 1, 2}, /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
+ [35] = {50000, 1, 2}, /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
+ [36] = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
+ [37] = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
+ [38] = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
+ [39] = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
+ [40] = {50000, 1, 2}, /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
+ [41] = {1000, 1, 1}, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+ [42] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
+ [43] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
+ [44] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
+ [45] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
+ [46] = {10000, 1, 1}, /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
+ [47] = {2500, 1, 1}, /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
+ [48] = {5000, 1, 1}, /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
+ [52] = {50000, 1, 1}, /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
+ [53] = {50000, 1, 1}, /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
+ [54] = {50000, 1, 1}, /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
+ [55] = {50000, 1, 1}, /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
+ [56] = {50000, 1, 1}, /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
+ [57] = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
+ [58] = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
+ [59] = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
+ [60] = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
+ [61] = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
+ [62] = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
+ [63] = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
+ [64] = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
+ [65] = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
+ [66] = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
+ [67] = {100, 1, 1}, /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
+ [68] = {1000, 1, 1}, /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
+ [69] = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
+ [70] = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
+ [71] = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
+ [72] = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
+ [73] = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
+ [75] = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
+ [76] = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
+ [77] = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
+ [78] = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
+ [79] = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
+ [80] = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
+ [81] = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
+ [82] = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
+ [83] = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
+ [84] = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
+ [85] = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
+ [86] = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
+ [87] = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
+ [88] = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
+ [89] = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
+ [90] = {100, 0, 1}, /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
+ [91] = {100, 1, 1}, /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
+ [92] = {10, 1, 1}, /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
+ [93] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
+ [94] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
+ [95] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
+ [96] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
+ [97] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
+ [98] = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
+ [99] = {10, 1, 1}, /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
+ [100] = {10, 0, 1}, /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
+ [101] = {10, 0, 1}, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
};
uint32_t
rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
{
- uint32_t speed;
- int duplex;
+ uint32_t speed, duplex, lanes;
/* get mode from array */
if (bit >= RTE_DIM(link_modes))
return RTE_ETH_LINK_SPEED_AUTONEG;
- speed = link_modes[bit];
- if (speed == 0)
+ if (link_modes[bit][RTE_ETH_LINK_MODES_INDEX_SPEED] == 0)
return RTE_ETH_LINK_SPEED_AUTONEG;
RTE_BUILD_BUG_ON(RTE_ETH_LINK_SPEED_AUTONEG != 0);
- /* duplex is LSB */
- duplex = (speed & 1) ?
- RTE_ETH_LINK_HALF_DUPLEX :
- RTE_ETH_LINK_FULL_DUPLEX;
- speed &= RTE_GENMASK32(31, 1);
-
- return rte_eth_speed_bitflag(speed, duplex);
+ speed = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_SPEED];
+ duplex = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_DUPLEX];
+ lanes = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_LANES];
+ return rte_eth_speed_bitflag(speed, duplex, lanes);
}
uint32_t
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index 0d36b9c30f..9092ab3a9e 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -79,4 +79,8 @@ void eth_dev_txq_release(struct rte_eth_dev *dev, uint16_t qid);
int eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
int eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
+/* versioned functions */
+uint32_t rte_eth_speed_bitflag_v24(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex);
+
#endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..5547b49cab 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -183,8 +183,10 @@ RTE_TRACE_POINT(
RTE_TRACE_POINT(
rte_eth_trace_speed_bitflag,
- RTE_TRACE_POINT_ARGS(uint32_t speed, int duplex, uint32_t ret),
+ RTE_TRACE_POINT_ARGS(uint32_t speed, uint8_t lanes, int duplex,
+ uint32_t ret),
rte_trace_point_emit_u32(speed);
+ rte_trace_point_emit_u8(lanes);
rte_trace_point_emit_int(duplex);
rte_trace_point_emit_u32(ret);
)
diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index f1d2586591..2c9588d0b3 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -62,3 +62,5 @@ endif
if get_option('buildtype').contains('debug')
cflags += ['-DRTE_FLOW_DEBUG']
endif
+
+use_function_versioning = true
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..6571116fbf 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -26,6 +26,7 @@
#include <rte_class.h>
#include <rte_ether.h>
#include <rte_telemetry.h>
+#include <rte_function_versioning.h>
#include "rte_ethdev.h"
#include "rte_ethdev_trace_fp.h"
@@ -991,63 +992,101 @@ rte_eth_dev_tx_queue_stop(uint16_t port_id, uint16_t tx_queue_id)
return ret;
}
-uint32_t
-rte_eth_speed_bitflag(uint32_t speed, int duplex)
+uint32_t __vsym
+rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex)
{
- uint32_t ret;
+ uint32_t ret = 0;
switch (speed) {
case RTE_ETH_SPEED_NUM_10M:
- ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
break;
case RTE_ETH_SPEED_NUM_100M:
- ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
break;
case RTE_ETH_SPEED_NUM_1G:
- ret = RTE_ETH_LINK_SPEED_1G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_1G;
break;
case RTE_ETH_SPEED_NUM_2_5G:
- ret = RTE_ETH_LINK_SPEED_2_5G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_2_5G;
break;
case RTE_ETH_SPEED_NUM_5G:
- ret = RTE_ETH_LINK_SPEED_5G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_5G;
break;
case RTE_ETH_SPEED_NUM_10G:
- ret = RTE_ETH_LINK_SPEED_10G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_10G;
+ else if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_10G_4LANES;
break;
case RTE_ETH_SPEED_NUM_20G:
- ret = RTE_ETH_LINK_SPEED_20G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_20G_2LANES;
break;
case RTE_ETH_SPEED_NUM_25G:
- ret = RTE_ETH_LINK_SPEED_25G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_25G;
break;
case RTE_ETH_SPEED_NUM_40G:
- ret = RTE_ETH_LINK_SPEED_40G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_40G_4LANES;
break;
case RTE_ETH_SPEED_NUM_50G:
- ret = RTE_ETH_LINK_SPEED_50G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_50G;
+ else if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_50G_2LANES;
break;
case RTE_ETH_SPEED_NUM_56G:
- ret = RTE_ETH_LINK_SPEED_56G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_56G_4LANES;
break;
case RTE_ETH_SPEED_NUM_100G:
- ret = RTE_ETH_LINK_SPEED_100G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_100G;
+ else if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_100G_2LANES;
+ else if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_100G_4LANES;
break;
case RTE_ETH_SPEED_NUM_200G:
- ret = RTE_ETH_LINK_SPEED_200G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_200G_4LANES;
+ else if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_200G_2LANES;
break;
case RTE_ETH_SPEED_NUM_400G:
- ret = RTE_ETH_LINK_SPEED_400G;
+ if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_400G_4LANES;
+ else if (lanes == RTE_ETH_LANES_8)
+ ret = RTE_ETH_LINK_SPEED_400G_8LANES;
break;
default:
ret = 0;
}
- rte_eth_trace_speed_bitflag(speed, duplex, ret);
+ rte_eth_trace_speed_bitflag(speed, lanes, duplex, ret);
return ret;
}
+uint32_t __vsym
+rte_eth_speed_bitflag_v24(uint32_t speed, int duplex)
+{
+ return rte_eth_speed_bitflag_v25(speed, RTE_ETH_LANES_UNKNOWN, duplex);
+}
+
+/* mark the v24 function as the older version, and v25 as the default version */
+VERSION_SYMBOL(rte_eth_speed_bitflag, _v24, 24);
+BIND_DEFAULT_SYMBOL(rte_eth_speed_bitflag, _v25, 25);
+MAP_STATIC_SYMBOL(uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex),
+ rte_eth_speed_bitflag_v25);
+
const char *
rte_eth_dev_rx_offload_name(uint64_t offload)
{
@@ -3110,13 +3149,21 @@ rte_eth_link_to_str(char *str, size_t len, const struct rte_eth_link *eth_link)
if (eth_link->link_status == RTE_ETH_LINK_DOWN)
ret = snprintf(str, len, "Link down");
- else
+ else if (eth_link->link_lanes == RTE_ETH_LANES_UNKNOWN)
ret = snprintf(str, len, "Link up at %s %s %s",
rte_eth_link_speed_to_str(eth_link->link_speed),
(eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
"FDX" : "HDX",
(eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
"Autoneg" : "Fixed");
+ else
+ ret = snprintf(str, len, "Link up at %s %u lanes %s %s",
+ rte_eth_link_speed_to_str(eth_link->link_speed),
+ eth_link->link_lanes,
+ (eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
+ "FDX" : "HDX",
+ (eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
+ "Autoneg" : "Fixed");
rte_eth_trace_link_to_str(len, eth_link, str, ret);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 147257d6a2..123b771046 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -288,24 +288,40 @@ struct rte_eth_stats {
/**@{@name Link speed capabilities
* Device supported speeds bitmap flags
*/
-#define RTE_ETH_LINK_SPEED_AUTONEG 0 /**< Autonegotiate (all speeds) */
-#define RTE_ETH_LINK_SPEED_FIXED RTE_BIT32(0) /**< Disable autoneg (fixed speed) */
-#define RTE_ETH_LINK_SPEED_10M_HD RTE_BIT32(1) /**< 10 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_10M RTE_BIT32(2) /**< 10 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3) /**< 100 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_100M RTE_BIT32(4) /**< 100 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_1G RTE_BIT32(5) /**< 1 Gbps */
-#define RTE_ETH_LINK_SPEED_2_5G RTE_BIT32(6) /**< 2.5 Gbps */
-#define RTE_ETH_LINK_SPEED_5G RTE_BIT32(7) /**< 5 Gbps */
-#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */
-#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps */
-#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
-#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
-#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
-#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
-#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */
-#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps */
-#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps */
+#define RTE_ETH_LINK_SPEED_AUTONEG 0 /**< Autonegotiate (all speeds) */
+#define RTE_ETH_LINK_SPEED_FIXED RTE_BIT32(0) /**< Disable autoneg (fixed speed) */
+#define RTE_ETH_LINK_SPEED_10M_HD RTE_BIT32(1) /**< 10 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_10M RTE_BIT32(2) /**< 10 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3) /**< 100 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_100M RTE_BIT32(4) /**< 100 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_1G RTE_BIT32(5) /**< 1 Gbps */
+#define RTE_ETH_LINK_SPEED_2_5G RTE_BIT32(6) /**< 2.5 Gbps */
+#define RTE_ETH_LINK_SPEED_5G RTE_BIT32(7) /**< 5 Gbps */
+#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */
+#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
+#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
+#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */
+#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17) /**< 10 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_2LANES RTE_BIT32(19) /**< 100 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_4LANES RTE_BIT32(20) /**< 100 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_200G_2LANES RTE_BIT32(21) /**< 200 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_400G_8LANES RTE_BIT32(22) /**< 400 Gbps 8lanes */
+/**@}*/
+
+/**@{@name Link speed capabilities
+ * Default lanes, use to compatible with earlier versions
+ */
+#define RTE_ETH_LINK_SPEED_20G_2LANES RTE_ETH_LINK_SPEED_20G
+#define RTE_ETH_LINK_SPEED_40G_4LANES RTE_ETH_LINK_SPEED_40G
+#define RTE_ETH_LINK_SPEED_56G_4LANES RTE_ETH_LINK_SPEED_56G
+#define RTE_ETH_LINK_SPEED_200G_4LANES RTE_ETH_LINK_SPEED_200G
+#define RTE_ETH_LINK_SPEED_400G_4LANES RTE_ETH_LINK_SPEED_400G
/**@}*/
/**@{@name Link speed
@@ -329,6 +345,16 @@ struct rte_eth_stats {
#define RTE_ETH_SPEED_NUM_UNKNOWN UINT32_MAX /**< Unknown */
/**@}*/
+/**@{@name Link lane number
+ * Ethernet lane number
+ */
+#define RTE_ETH_LANES_UNKNOWN 0 /**< Unknown */
+#define RTE_ETH_LANES_1 1 /**< 1 lanes */
+#define RTE_ETH_LANES_2 2 /**< 2 lanes */
+#define RTE_ETH_LANES_4 4 /**< 4 lanes */
+#define RTE_ETH_LANES_8 8 /**< 8 lanes */
+/**@}*/
+
/**
* A structure used to retrieve link-level information of an Ethernet port.
*/
@@ -338,6 +364,7 @@ struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
+ uint16_t link_lanes : 4; /**< RTE_ETH_LANES_ */
};
/**@{@name Link negotiation
@@ -1641,6 +1668,12 @@ struct rte_eth_conf {
#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3)
/** Device supports keeping shared flow objects across restart. */
#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**
+ * Device supports setting lanes. When the device does not support it,
+ * if a speed supports different numbers of lanes, the application does
+ * not knowe which the lane number are used by the device.
+ */
+#define RTE_ETH_DEV_CAPA_SETTING_LANES RTE_BIT64(5)
/**@}*/
/*
@@ -2301,12 +2334,16 @@ uint16_t rte_eth_dev_count_total(void);
*
* @param speed
* Numerical speed value in Mbps
+ * @param lanes
+ * number of lanes (RTE_ETH_LANES_x)
+ * RTE_ETH_LANES_UNKNOWN is always used when the device does not support
+ * setting lanes
* @param duplex
* RTE_ETH_LINK_[HALF/FULL]_DUPLEX (only for 10/100M speeds)
* @return
* 0 if the speed cannot be mapped
*/
-uint32_t rte_eth_speed_bitflag(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex);
/**
* Get RTE_ETH_RX_OFFLOAD_* flag name.
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..9fa2439976 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -169,6 +169,12 @@ DPDK_24 {
local: *;
};
+DPDK_25 {
+ global:
+
+ rte_eth_speed_bitflag;
+} DPDK_24;
+
EXPERIMENTAL {
global:
--
2.33.0
^ permalink raw reply [relevance 5%]
* Re: [PATCH 15/46] net/sfc: use rte stdatomic API
2024-03-21 18:11 3% ` Aaron Conole
@ 2024-03-21 18:15 0% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-21 18:15 UTC (permalink / raw)
To: Aaron Conole
Cc: dev, Mattias Rönnblom, Morten Brørup,
Abdullah Sevincer, Ajit Khaparde, Alok Prasad, Anatoly Burakov,
Andrew Rybchenko, Anoob Joseph, Bruce Richardson, Byron Marohn,
Chenbo Xia, Chengwen Feng, Ciara Loftus, Ciara Power,
Dariusz Sosnowski, David Hunt, Devendra Singh Rawat,
Erik Gabriel Carrillo, Guoyang Zhou, Harman Kalra,
Harry van Haaren, Honnappa Nagarahalli, Jakub Grajciar,
Jerin Jacob, Jeroen de Borst, Jian Wang, Jiawen Wu, Jie Hai,
Jingjing Wu, Joshua Washington, Joyce Kong, Junfeng Guo,
Kevin Laatz, Konstantin Ananyev, Liang Ma, Long Li,
Maciej Czekaj, Matan Azrad, Maxime Coquelin, Nicolas Chautru,
Ori Kam, Pavan Nikhilesh, Peter Mccarthy, Rahul Lakkireddy,
Reshma Pattan, Rosen Xu, Ruifeng Wang, Rushil Gupta,
Sameh Gobriel, Sivaprasad Tummala, Somnath Kotur,
Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
Sunil Uttarwar, Tetsuya Mukawa, Vamsi Attunuru,
Viacheslav Ovsiienko, Vladimir Medvedkin, Xiaoyun Wang,
Yipeng Wang, Yisen Zhuang, Yuying Zhang, Ziyang Xuan
On Thu, Mar 21, 2024 at 02:11:00PM -0400, Aaron Conole wrote:
> Tyler Retzlaff <roretzla@linux.microsoft.com> writes:
>
> > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > corresponding rte_atomic_xxx optional rte stdatomic API.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> > drivers/net/sfc/meson.build | 5 ++---
> > drivers/net/sfc/sfc_mae_counter.c | 30 +++++++++++++++---------------
> > drivers/net/sfc/sfc_repr_proxy.c | 8 ++++----
> > drivers/net/sfc/sfc_stats.h | 8 ++++----
> > 4 files changed, 25 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
> > index 5adde68..d3603a0 100644
> > --- a/drivers/net/sfc/meson.build
> > +++ b/drivers/net/sfc/meson.build
> > @@ -47,9 +47,8 @@ int main(void)
> > __int128 a = 0;
> > __int128 b;
> >
> > - b = __atomic_load_n(&a, __ATOMIC_RELAXED);
> > - __atomic_store(&b, &a, __ATOMIC_RELAXED);
> > - __atomic_store_n(&b, a, __ATOMIC_RELAXED);
> > + b = rte_atomic_load_explicit(&a, rte_memory_order_relaxed);
> > + rte_atomic_store_explicit(&b, a, rte_memory_order_relaxed);
> > return 0;
> > }
> > '''
>
> I think this is a case where simple find/replace is a problem. For
> example, this is a sample file that the meson build uses to determine if
> libatomic is properly installed. However, it is very bare-bones.
>
> Your change is likely causing a compile error when cc.links happens in
> the meson file. That leads to the ABI error.
>
> If the goal is to remove all the intrinsics, then maybe a better change
> would be dropping this libatomic check from here completely.
>
> WDYT?
yeah, actually it wasn't a search replace mistake it was an
unintentionally included file where i was experimenting with keeping the
test (thought i had reverted it).
i shouldn't have added the change to the series thanks for pointing the
mistake out and sorry for the noise.
appreciate it!
^ permalink raw reply [relevance 0%]
* Re: [PATCH 15/46] net/sfc: use rte stdatomic API
@ 2024-03-21 18:11 3% ` Aaron Conole
2024-03-21 18:15 0% ` Tyler Retzlaff
0 siblings, 1 reply; 200+ results
From: Aaron Conole @ 2024-03-21 18:11 UTC (permalink / raw)
To: Tyler Retzlaff
Cc: dev, Mattias Rönnblom, Morten Brørup,
Abdullah Sevincer, Ajit Khaparde, Alok Prasad, Anatoly Burakov,
Andrew Rybchenko, Anoob Joseph, Bruce Richardson, Byron Marohn,
Chenbo Xia, Chengwen Feng, Ciara Loftus, Ciara Power,
Dariusz Sosnowski, David Hunt, Devendra Singh Rawat,
Erik Gabriel Carrillo, Guoyang Zhou, Harman Kalra,
Harry van Haaren, Honnappa Nagarahalli, Jakub Grajciar,
Jerin Jacob, Jeroen de Borst, Jian Wang, Jiawen Wu, Jie Hai,
Jingjing Wu, Joshua Washington, Joyce Kong, Junfeng Guo,
Kevin Laatz, Konstantin Ananyev, Liang Ma, Long Li,
Maciej Czekaj, Matan Azrad, Maxime Coquelin, Nicolas Chautru,
Ori Kam, Pavan Nikhilesh, Peter Mccarthy, Rahul Lakkireddy,
Reshma Pattan, Rosen Xu, Ruifeng Wang, Rushil Gupta,
Sameh Gobriel, Sivaprasad Tummala, Somnath Kotur,
Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
Sunil Uttarwar, Tetsuya Mukawa, Vamsi Attunuru,
Viacheslav Ovsiienko, Vladimir Medvedkin, Xiaoyun Wang,
Yipeng Wang, Yisen Zhuang, Yuying Zhang, Ziyang Xuan
Tyler Retzlaff <roretzla@linux.microsoft.com> writes:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional rte stdatomic API.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
> drivers/net/sfc/meson.build | 5 ++---
> drivers/net/sfc/sfc_mae_counter.c | 30 +++++++++++++++---------------
> drivers/net/sfc/sfc_repr_proxy.c | 8 ++++----
> drivers/net/sfc/sfc_stats.h | 8 ++++----
> 4 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
> index 5adde68..d3603a0 100644
> --- a/drivers/net/sfc/meson.build
> +++ b/drivers/net/sfc/meson.build
> @@ -47,9 +47,8 @@ int main(void)
> __int128 a = 0;
> __int128 b;
>
> - b = __atomic_load_n(&a, __ATOMIC_RELAXED);
> - __atomic_store(&b, &a, __ATOMIC_RELAXED);
> - __atomic_store_n(&b, a, __ATOMIC_RELAXED);
> + b = rte_atomic_load_explicit(&a, rte_memory_order_relaxed);
> + rte_atomic_store_explicit(&b, a, rte_memory_order_relaxed);
> return 0;
> }
> '''
I think this is a case where simple find/replace is a problem. For
example, this is a sample file that the meson build uses to determine if
libatomic is properly installed. However, it is very bare-bones.
Your change is likely causing a compile error when cc.links happens in
the meson file. That leads to the ABI error.
If the goal is to remove all the intrinsics, then maybe a better change
would be dropping this libatomic check from here completely.
WDYT?
^ permalink raw reply [relevance 3%]
* Re: [PATCH 02/15] eal: pack structures when building with MSVC
@ 2024-03-21 16:02 3% ` Bruce Richardson
0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-03-21 16:02 UTC (permalink / raw)
To: Tyler Retzlaff
Cc: dev, Akhil Goyal, Aman Singh, Anatoly Burakov, Byron Marohn,
Conor Walsh, Cristian Dumitrescu, Dariusz Sosnowski, David Hunt,
Jerin Jacob, Jingjing Wu, Kirill Rybalchenko, Konstantin Ananyev,
Matan Azrad, Ori Kam, Radu Nicolau, Ruifeng Wang, Sameh Gobriel,
Sivaprasad Tummala, Suanming Mou, Sunil Kumar Kori,
Vamsi Attunuru, Viacheslav Ovsiienko, Vladimir Medvedkin,
Yipeng Wang, Yuying Zhang
On Wed, Mar 20, 2024 at 02:05:58PM -0700, Tyler Retzlaff wrote:
> Add __rte_msvc_pack to all __rte_packed structs to cause packing
> when building with MSVC.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
> lib/eal/common/eal_private.h | 1 +
> lib/eal/include/rte_memory.h | 1 +
> lib/eal/include/rte_memzone.h | 1 +
> lib/eal/include/rte_trace_point.h | 1 +
> lib/eal/x86/include/rte_memcpy.h | 3 +++
> 5 files changed, 7 insertions(+)
>
> diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
> index 71523cf..21ace2a 100644
> --- a/lib/eal/common/eal_private.h
> +++ b/lib/eal/common/eal_private.h
> @@ -43,6 +43,7 @@ struct lcore_config {
> /**
> * The global RTE configuration structure.
> */
> +__rte_msvc_pack
> struct rte_config {
> uint32_t main_lcore; /**< Id of the main lcore */
> uint32_t lcore_count; /**< Number of available logical cores. */
This struct almost certainly doesn't need to be packed - since it's in a
private header, I would imagine removing packing wouldn't be an ABI
break.
Also, removing rte_packed doesn't change the size for me for a 64-bit x86
build. Looking at the struct, I don't see why it would change on a 32-bit
build either.
> diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
> index 842362d..73bb00d 100644
> --- a/lib/eal/include/rte_memory.h
> +++ b/lib/eal/include/rte_memory.h
> @@ -46,6 +46,7 @@
> /**
> * Physical memory segment descriptor.
> */
> +__rte_msvc_pack
> struct rte_memseg {
> rte_iova_t iova; /**< Start IO address. */
> union {
> diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
> index 931497f..ca312c0 100644
> --- a/lib/eal/include/rte_memzone.h
> +++ b/lib/eal/include/rte_memzone.h
> @@ -45,6 +45,7 @@
> * A structure describing a memzone, which is a contiguous portion of
> * physical memory identified by a name.
> */
> +__rte_msvc_pack
> struct rte_memzone {
>
This also doesn't look like it should be packed. It is a public header
though so we may need to be more careful. Checking a 64-bit x86 build shows
no size change when removing the "packed" attribute, though. For 32-bit, I
think the "size_t" field in the middle would be followed by padding on
32-bit if we removed the "packed" attribute, so it may be a no-go for now.
:-(
> #define RTE_MEMZONE_NAMESIZE 32 /**< Maximum length of memory zone name.*/
> diff --git a/lib/eal/include/rte_trace_point.h b/lib/eal/include/rte_trace_point.h
> index 41e2a7f..63f333c 100644
> --- a/lib/eal/include/rte_trace_point.h
> +++ b/lib/eal/include/rte_trace_point.h
> @@ -292,6 +292,7 @@ int __rte_trace_point_register(rte_trace_point_t *trace, const char *name,
> #define __RTE_TRACE_FIELD_ENABLE_MASK (1ULL << 63)
> #define __RTE_TRACE_FIELD_ENABLE_DISCARD (1ULL << 62)
>
> +__rte_msvc_pack
> struct __rte_trace_stream_header {
> uint32_t magic;
> rte_uuid_t uuid;
From code review, this doesn't look like "packed" has any impact, since all
fields should naturally be aligned on both 32-bit and 64-bit builds.
/Bruce
^ permalink raw reply [relevance 3%]
* DPDK Release Status Meeting 2024-03-21
@ 2024-03-21 14:49 3% Mcnamara, John
0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2024-03-21 14:49 UTC (permalink / raw)
To: dev; +Cc: thomas, Marchand, David
[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]
Release status meeting minutes 2024-03-21
=========================================
Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens
Participants:
* AMD
* ARM
* Intel
* Marvell
* Nvidia
* Red Hat
Release Dates
-------------
The following are the current/updated working dates for 24.03:
* V1: 29 December 2023
* RC1: 21 February 2024
* RC2: 8 March 2024
* RC3: 18 March 2024
* RC4: 22 March 2024
* Release: 27 March 2024
https://core.dpdk.org/roadmap/#dates
Subtrees
--------
* next-net
* Some fixes merged.
* Ready for Pull
* next-net-intel
* 2 fix/doc patches.
* next-net-mlx
* Series merged after RC3.
* next-net-mvl
* No new changes post RC3.
* next-eventdev
* No new changes post RC3.
* next-baseband
* No new changes post RC3.
* next-virtio
* No new changes post RC3.
* next-crypto
* Some doc patches.
* Patch for ipsecgw sample app to be postponed to
next release due to risk of breakage in other PMDs.
* main
* RH testing for RC3 - no major issues.
* Looking at Windows patches for next release to
make sure there aren't any API/ABI breaking changes.
* Doc fixes and release notes.
* Proposed 24.03 dates:
* RC4: 22 March 2024
* Release: 27 March 2024
LTS
---
Please add acks to confirm validation support for a 3 year LTS window:
http://inbox.dpdk.org/dev/20240117161804.223582-1-ktraynor@redhat.com/
* 23.11.1 - In progress.
* 22.11.5 - In progress.
* 21.11.7 - In progress.
* 20.11.10 - Will only be updated with CVE and critical fixes.
* 19.11.15 - Will only be updated with CVE and critical fixes.
* Distros
* Debian 12 contains DPDK v22.11
* Ubuntu 24.04-LTS will contain DPDK v23.11
* Ubuntu 23.04 contains DPDK v22.11
Defects
-------
* Bugzilla links, 'Bugs', added for hosted projects
* https://www.dpdk.org/hosted-projects/
DPDK Release Status Meetings
----------------------------
The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.
The meeting occurs on every Thursday at 9:30 UTC over Jitsi on https://meet.jit.si/DPDK
You don't need an invite to join the meeting but if you want a calendar reminder just
send an email to "John McNamara john.mcnamara@intel.com" for the invite.
[-- Attachment #2: Type: text/html, Size: 13374 bytes --]
^ permalink raw reply [relevance 3%]
* Re: [PATCH v6 02/23] mbuf: consolidate driver asserts for mbuf struct
@ 2024-03-14 16:51 4% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-14 16:51 UTC (permalink / raw)
To: dev, techboard
Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
Yuying Zhang, mb
We've gone around in circles a little on this series. Let's discuss it
at the next techboard meeting, please put it on the agenda.
Summary
MSVC does not support the typedef of zero-sized typed arrays used in
struct rte_mbuf and a handful of other structs built on Windows. Better
known as ``RTE_MARKER`` fields.
There are two competing solutions we would like to know which to move
forward with.
1. Use C11 anonymous unions and anonymous struct extensions to replace
the RTE_MARKER fields which can maintain ABI and API compatibility.
2. Provide inline accessors for struct rte_mbuf for some fields and
removing others which maintains ABI but breaks API.
I'm proposing a mix of 1 & 2 to maintain ABI and some API for struct
rte_mbuf fields but remove (API breaking) cacheline{0,1} RTE_MARKER
fields in favor of existing inline functions for prefetching.
Thanks!
On Mon, Feb 26, 2024 at 09:41:18PM -0800, Tyler Retzlaff wrote:
> Collect duplicated RTE_BUILD_BUG_ON checks from drivers and place them
> at global scope with struct rte_mbuf definition using static_assert.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
> lib/mbuf/rte_mbuf_core.h | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index 7000c04..36551c2 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -16,8 +16,11 @@
> * New fields and flags should fit in the "dynamic space".
> */
>
> +#include <assert.h>
> +#include <stddef.h>
> #include <stdint.h>
>
> +#include <rte_common.h>
> #include <rte_byteorder.h>
> #include <rte_stdatomic.h>
>
> @@ -673,6 +676,37 @@ struct rte_mbuf {
> uint32_t dynfield1[9]; /**< Reserved for dynamic fields. */
> } __rte_cache_aligned;
>
> +static_assert(!(offsetof(struct rte_mbuf, ol_flags) !=
> + offsetof(struct rte_mbuf, rearm_data) + 8), "ol_flags");
> +static_assert(!(offsetof(struct rte_mbuf, rearm_data) !=
> + RTE_ALIGN(offsetof(struct rte_mbuf, rearm_data), 16)), "rearm_data");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) !=
> + offsetof(struct rte_mbuf, rearm_data)), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) <
> + offsetof(struct rte_mbuf, rearm_data)), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, refcnt) <
> + offsetof(struct rte_mbuf, rearm_data)), "refcnt");
> +static_assert(!(offsetof(struct rte_mbuf, nb_segs) <
> + offsetof(struct rte_mbuf, rearm_data)), "nb_segs");
> +static_assert(!(offsetof(struct rte_mbuf, port) <
> + offsetof(struct rte_mbuf, rearm_data)), "port");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) -
> + offsetof(struct rte_mbuf, rearm_data) > 6), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, refcnt) -
> + offsetof(struct rte_mbuf, rearm_data) > 6), "refcnt");
> +static_assert(!(offsetof(struct rte_mbuf, nb_segs) -
> + offsetof(struct rte_mbuf, rearm_data) > 6), "nb_segs");
> +static_assert(!(offsetof(struct rte_mbuf, port) -
> + offsetof(struct rte_mbuf, rearm_data) > 6), "port");
> +static_assert(!(offsetof(struct rte_mbuf, pkt_len) !=
> + offsetof(struct rte_mbuf, rx_descriptor_fields1) + 4), "pkt_len");
> +static_assert(!(offsetof(struct rte_mbuf, data_len) !=
> + offsetof(struct rte_mbuf, rx_descriptor_fields1) + 8), "data_len");
> +static_assert(!(offsetof(struct rte_mbuf, vlan_tci) !=
> + offsetof(struct rte_mbuf, rx_descriptor_fields1) + 10), "vlan_tci");
> +static_assert(!(offsetof(struct rte_mbuf, hash) !=
> + offsetof(struct rte_mbuf, rx_descriptor_fields1) + 12), "hash");
> +
> /**
> * Function typedef of callback to free externally attached buffer.
> */
> --
> 1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [RFC v2] eal: increase passed max multi-process file descriptors
2024-03-14 15:23 0% ` Stephen Hemminger
@ 2024-03-14 15:38 3% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-14 15:38 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, Anatoly Burakov, Jianfeng Tan
On Thu, Mar 14, 2024 at 4:23 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
> Rather than mess with versioning everything, probably better to just
> hold off to 24.11 release and do the change there.
>
> It will limit xdp and tap PMD's to 8 queues but no user has been
> demanding more yet.
IIUC, this limitation only applies to multiprocess setups.
Waiting for next ABI seems the simpler approach if there is no
explicit ask for this change.
Until someone wants to send more than 253 fds :-).
--
David Marchand
^ permalink raw reply [relevance 3%]
* Re: [RFC v2] eal: increase passed max multi-process file descriptors
2024-03-08 20:36 8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
@ 2024-03-14 15:23 0% ` Stephen Hemminger
2024-03-14 15:38 3% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-14 15:23 UTC (permalink / raw)
To: dev; +Cc: Anatoly Burakov, Jianfeng Tan
On Fri, 8 Mar 2024 12:36:39 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:
> Both XDP and TAP device are limited in the number of queues
> because of limitations on the number of file descriptors that
> are allowed. The original choice of 8 was too low; the allowed
> maximum is 253 according to unix(7) man page.
>
> This may look like a serious ABI breakage but it is not.
> It is simpler for everyone if the limit is increased rather than
> building a parallel set of calls.
>
> The case that matters is older application registering MP support
> with the newer version of EAL. In this case, since the old application
> will always send the more compact structure (less possible fd's)
> it is OK.
>
> Request (for up to 8 fds) sent to EAL.
> - EAL only references up to num_fds.
> - The area past the old fd array is not accessed.
>
> Reply callback:
> - EAL will pass pointer to the new (larger structure),
> the old callback will only look at the first part of
> the fd array (num_fds <= 8).
>
> - Since primary and secondary must both be from same DPDK version
> there is normal way that a reply with more fd's could be possible.
> The only case is the same as above, where application requested
> something that would break in old version and now succeeds.
>
> The one possible incompatibility is that if application passed
> a larger number of fd's (32?) and expected an error. Now it will
> succeed and get passed through.
>
> Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> v2 - show the simpler way to address with some minor ABI issue
>
> doc/guides/rel_notes/release_24_03.rst | 4 ++++
> lib/eal/include/rte_eal.h | 2 +-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 932688ca4d82..1d33cfa15dfb 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -225,6 +225,10 @@ API Changes
> * ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
> ``rte_flow_field_data`` for more generic usage.
>
> +* eal: The maximum number of file descriptors allowed to be passed in
> + multi-process requests is increased from 8 to the maximum possible on
> + Linux unix domain sockets 253. This allows for more queues on XDP and
> + TAP device.
>
> ABI Changes
> -----------
> diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
> index c2256f832e51..cd84fcdd1bdb 100644
> --- a/lib/eal/include/rte_eal.h
> +++ b/lib/eal/include/rte_eal.h
> @@ -155,7 +155,7 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
> */
> bool rte_mp_disable(void);
>
> -#define RTE_MP_MAX_FD_NUM 8 /* The max amount of fds */
> +#define RTE_MP_MAX_FD_NUM 253 /* The max amount of fds */
> #define RTE_MP_MAX_NAME_LEN 64 /* The max length of action name */
> #define RTE_MP_MAX_PARAM_LEN 256 /* The max length of param */
> struct rte_mp_msg {
Rather than mess with versioning everything, probably better to just
hold off to 24.11 release and do the change there.
It will limit xdp and tap PMD's to 8 queues but no user has been
demanding more yet.
^ permalink raw reply [relevance 0%]
* Re: [RFC] eal: increase the number of availble file descriptors for MP
2024-03-08 18:54 2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
2024-03-08 20:36 8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
2024-03-09 18:12 2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
@ 2024-03-14 14:40 4% ` David Marchand
2 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-14 14:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, Anatoly Burakov, Ferruh Yigit, Thomas Monjalon
On Fri, Mar 8, 2024 at 7:54 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> The current limit of file descriptors is too low, it should have
> been set to the maximum possible to send across an unix domain
> socket.
>
> This is an attempt to allow increasing it without breaking ABI.
> But in the process it exposes what is broken about how symbol
> versions are checked in check-symbol-maps.sh. That script is
> broken in that it won't allow adding a backwards compatiable
> version hook like this.
- It could be enhanced maybe, but I see no problem with the script.
The versions for compat symbols in this patch are wrong.
We want to keep compat with ABI 24, not 23.
And next ABI will be 25.
- rte_mp_old_msg does not have to be exported as public in rte_eal.h.
- I think the patch is not complete:
* rte_mp_action_register and rte_mp_request_async need versioning too,
* because of the former point, handling of msg requests probably
needs to keep track of accepted length per registered callbacks,
--
David Marchand
^ permalink raw reply [relevance 4%]
* Re: [PATCH 1/1] eal: add C++ include guard in generic/rte_vect.h
@ 2024-03-14 3:45 3% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-14 3:45 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Ashish Sadanandan, Bruce Richardson, Thomas Monjalon, dev,
nelio.laranjeiro, stable, honnappa.nagarahalli,
konstantin.v.ananyev, david.marchand, ruifeng.wang
On Wed, Mar 13, 2024 at 04:45:36PM -0700, Stephen Hemminger wrote:
> On Fri, 2 Feb 2024 13:58:19 -0700
> Ashish Sadanandan <ashish.sadanandan@gmail.com> wrote:
>
> > > I think just having the extern "C" guard in all files is the safest choice,
> > > because it's immediately obvious in each and every file that it is correct.
> > > Taking the other option, to check any indirect include file you need to go
> > > finding what other files include it and check there that a) they have
> > > include guards and b) the include for the indirect header is contained
> > > within it.
> > >
> > > Adopting the policy of putting the guard in each and every header is also a
> > > lot easier to do basic automated sanity checks on. If the file ends in .h,
> > > we just use grep to quickly verify it's not missing the guards. [Naturally,
> > > we can do more complete checks than that if we want, but 99% percent of
> > > misses can be picked up by a grep for the 'extern "C"' bit]
> > >
> > > /Bruce
> > >
> >
> > 100% agree with Bruce. It's a valid ideological argument that private
> > headers
> > don't need such safeguards, but it's difficult to enforce and easy to break
> > during refactoring.
> >
> > - Ashish
>
> But splashing this across all the internal driver headers is bad idea.
> It should only apply to header files that exported in final package.
while we don't provide api/abi stability promises for driver headers we
do optionally install them with -Denable_driver_sdk=true.
the driver sdk allows drivers to be developed outside of the dpdk source
tree, many such drivers are explicitly authored in C++ and are outside of
the dpdk source tree because dpdk does not allow C++ drivers in tree.
^ permalink raw reply [relevance 3%]
* Re: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
2024-03-10 14:29 0% ` Brandes, Shai
@ 2024-03-13 11:21 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-03-13 11:21 UTC (permalink / raw)
To: Brandes, Shai; +Cc: dev
On 3/10/2024 2:29 PM, Brandes, Shai wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, March 8, 2024 7:24 PM
>> To: Brandes, Shai <shaibran@amazon.com>
>> Cc: dev@dpdk.org
>> Subject: RE: [EXTERNAL] [PATCH v3 07/33] net/ena: restructure the llq policy
>> setting process
>>
>> CAUTION: This email originated from outside of the organization. Do not click
>> links or open attachments unless you can confirm the sender and know the
>> content is safe.
>>
>>
>>
>> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
>>> From: Shai Brandes <shaibran@amazon.com>
>>>
>>> The driver will set the size of the LLQ header size according to the
>>> recommendation from the device.
>>> Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg
>>> `llq_policy` that accepts the following values:
>>> 0 - Disable LLQ.
>>> Use with extreme caution as it leads to a huge performance
>>> degradation on AWS instances from 6th generation onwards.
>>> 1 - Accept device recommended LLQ policy (Default).
>>> Device can recommend normal or large LLQ policy.
>>> 2 - Enforce normal LLQ policy.
>>> 3 - Enforce large LLQ policy.
>>> Required for packets with header that exceed 96 bytes on
>>> AWS instances prior to 5th generation.
>>>
>>
>> We had similar discussion before, although dev_args is not part of the ABI, it
>> is an user interface, and changes in the devargs will impact users directly.
>>
>> What would you think to either keep backward compatilibity in the devargs
>> (like not remove old one but add new one), or do this change in
>> 24.11 release?
> [Brandes, Shai] understood.
> The new devarg replaced the old ones and added option to enforce normal-llq mode which is critical for our release.
> As you suggested, we will keep backward compatibility and add an additional devarg for enforcing normal-llq policy.
> That way, we can easily replace it in future releases with a common devarg without the need to make major logic changes.
>
ack.
^ permalink raw reply [relevance 0%]
* RE: [PATCH v6 1/5] ci: replace IPsec-mb package install
2024-03-12 16:13 3% ` David Marchand
@ 2024-03-12 17:07 0% ` Power, Ciara
0 siblings, 0 replies; 200+ results
From: Power, Ciara @ 2024-03-12 17:07 UTC (permalink / raw)
To: Marchand, David
Cc: Dooley, Brian, Aaron Conole, Michael Santana, dev, gakhil,
De Lara Guarch, Pablo, probb, wathsala.vithanage,
Thomas Monjalon, Richardson, Bruce
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, March 12, 2024 4:14 PM
> To: Power, Ciara <ciara.power@intel.com>
> Cc: Dooley, Brian <brian.dooley@intel.com>; Aaron Conole
> <aconole@redhat.com>; Michael Santana <maicolgabriel@hotmail.com>;
> dev@dpdk.org; gakhil@marvell.com; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; probb@iol.unh.edu;
> wathsala.vithanage@arm.com; Thomas Monjalon <thomas@monjalon.net>;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
>
> On Tue, Mar 12, 2024 at 4:26 PM Power, Ciara <ciara.power@intel.com> wrote:
> > > From: David Marchand <david.marchand@redhat.com> On Tue, Mar 12,
> > > 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> > > wrote:
> > > >
> > > > From: Ciara Power <ciara.power@intel.com>
> > > >
> > > > The IPsec-mb version that is available through current package
> > > > managers is 1.2.
> > > > This release moves the minimum required IPsec-mb version for
> > > > IPsec-mb based SW PMDs to 1.4.
> > > > To compile these PMDs, a manual step is added to install IPsec-mb
> > > > v1.4 using dpkg.
> > > >
> > > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > > > ---
> > > > .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> > > > 1 file changed, 22 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/.github/workflows/build.yml
> > > > b/.github/workflows/build.yml index 776fbf6f30..ed44b1f730 100644
> > > > --- a/.github/workflows/build.yml
> > > > +++ b/.github/workflows/build.yml
> > > > @@ -106,9 +106,15 @@ jobs:
> > > > run: sudo apt update || true
> > > > - name: Install packages
> > > > run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > > > - libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > > > + libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> > > > libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-
> pip
> > > > python3-pyelftools python3-setuptools python3-wheel
> > > > zlib1g-dev
> > > > + - name: Install ipsec-mb library
> > > > + run: |
> > > > + wget
> > > > + "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > > mb-dev_1.4-3_amd64.deb"
> > > > + wget
> > > > + "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > > mb1_1.4-3_amd64.deb"
> > > > + sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > > > + sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
> > >
> > > I am not enthousiastic at advertising a kind of out of tree approach.
> > > That's a bit like if NVIDIA asked us to stop testing distribution
> > > rdma-core packages and instead rely on MOFED.
> > >
> > > Why are we removing support for versions that are packaged by the
> > > main distributions?
> >
> > With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the
> package manager.
> > We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from
> > this release onwards, removing the many ifdef codepaths in the PMDs
> > for older versions. (patch included in this patchset)
> >
> > Some of the other CI environments were updated to install v1.4 already
> > to support this change, but we found the github CI robot was limited for ipsec-
> mb versions when using the package manager.
> > It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in
> reference build, but not compiled after patch).
>
> Such a change means that users of the Ubuntu/Fedora dpdk package lose access
> to those drivers hypothetically.
> "Hypothetically", because in reality, Ubuntu and others distributions won't
> update to non LTS versions.
>
> On the other hand, if a user was building DPDK (and not the one provided by
> the distribution), now the user has to stop using the ipsec mb provided by the
> distribution: building/packaging/maintaining the ipsec mb library is now forced
> on the user plate.
>
> I am unclear if this qualifies as a ABI breakage, but I am not confortable with this
> change.
Hi David,
Ah, okay - thanks for the explanation.
Those are points I had missed, but it makes sense.
We will drop the version bump to v1.4 for this release, and revisit in a later release when suitable.
Thanks,
Ciara
^ permalink raw reply [relevance 0%]
* Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
2024-03-12 15:26 3% ` Power, Ciara
@ 2024-03-12 16:13 3% ` David Marchand
2024-03-12 17:07 0% ` Power, Ciara
0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-03-12 16:13 UTC (permalink / raw)
To: Power, Ciara
Cc: Dooley, Brian, Aaron Conole, Michael Santana, dev, gakhil,
De Lara Guarch, Pablo, probb, wathsala.vithanage,
Thomas Monjalon, Bruce Richardson
On Tue, Mar 12, 2024 at 4:26 PM Power, Ciara <ciara.power@intel.com> wrote:
> > From: David Marchand <david.marchand@redhat.com>
> > On Tue, Mar 12, 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> > wrote:
> > >
> > > From: Ciara Power <ciara.power@intel.com>
> > >
> > > The IPsec-mb version that is available through current package
> > > managers is 1.2.
> > > This release moves the minimum required IPsec-mb version for IPsec-mb
> > > based SW PMDs to 1.4.
> > > To compile these PMDs, a manual step is added to install IPsec-mb v1.4
> > > using dpkg.
> > >
> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > > ---
> > > .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> > > 1 file changed, 22 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
> > > index 776fbf6f30..ed44b1f730 100644
> > > --- a/.github/workflows/build.yml
> > > +++ b/.github/workflows/build.yml
> > > @@ -106,9 +106,15 @@ jobs:
> > > run: sudo apt update || true
> > > - name: Install packages
> > > run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > > - libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > > + libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> > > libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-pip
> > > python3-pyelftools python3-setuptools python3-wheel
> > > zlib1g-dev
> > > + - name: Install ipsec-mb library
> > > + run: |
> > > + wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > mb-dev_1.4-3_amd64.deb"
> > > + wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > mb1_1.4-3_amd64.deb"
> > > + sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > > + sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
> >
> > I am not enthousiastic at advertising a kind of out of tree approach.
> > That's a bit like if NVIDIA asked us to stop testing distribution rdma-core
> > packages and instead rely on MOFED.
> >
> > Why are we removing support for versions that are packaged by the main
> > distributions?
>
> With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the package manager.
> We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from this release onwards,
> removing the many ifdef codepaths in the PMDs for older versions. (patch included in this patchset)
>
> Some of the other CI environments were updated to install v1.4 already to support this change,
> but we found the github CI robot was limited for ipsec-mb versions when using the package manager.
> It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in reference build, but not compiled after patch).
Such a change means that users of the Ubuntu/Fedora dpdk package lose
access to those drivers hypothetically.
"Hypothetically", because in reality, Ubuntu and others distributions
won't update to non LTS versions.
On the other hand, if a user was building DPDK (and not the one
provided by the distribution), now the user has to stop using the
ipsec mb provided by the distribution: building/packaging/maintaining
the ipsec mb library is now forced on the user plate.
I am unclear if this qualifies as a ABI breakage, but I am not
confortable with this change.
--
David Marchand
^ permalink raw reply [relevance 3%]
* RE: [PATCH v6 1/5] ci: replace IPsec-mb package install
@ 2024-03-12 15:26 3% ` Power, Ciara
2024-03-12 16:13 3% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Power, Ciara @ 2024-03-12 15:26 UTC (permalink / raw)
To: Marchand, David, Dooley, Brian
Cc: Aaron Conole, Michael Santana, dev, gakhil, De Lara Guarch,
Pablo, probb, wathsala.vithanage
Hi David,
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, March 12, 2024 1:54 PM
> To: Dooley, Brian <brian.dooley@intel.com>
> Cc: Aaron Conole <aconole@redhat.com>; Michael Santana
> <maicolgabriel@hotmail.com>; dev@dpdk.org; gakhil@marvell.com; De Lara
> Guarch, Pablo <pablo.de.lara.guarch@intel.com>; probb@iol.unh.edu;
> wathsala.vithanage@arm.com; Power, Ciara <ciara.power@intel.com>
> Subject: Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
>
> Hello,
>
> On Tue, Mar 12, 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> wrote:
> >
> > From: Ciara Power <ciara.power@intel.com>
> >
> > The IPsec-mb version that is available through current package
> > managers is 1.2.
> > This release moves the minimum required IPsec-mb version for IPsec-mb
> > based SW PMDs to 1.4.
> > To compile these PMDs, a manual step is added to install IPsec-mb v1.4
> > using dpkg.
> >
> > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > ---
> > .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> > 1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
> > index 776fbf6f30..ed44b1f730 100644
> > --- a/.github/workflows/build.yml
> > +++ b/.github/workflows/build.yml
> > @@ -106,9 +106,15 @@ jobs:
> > run: sudo apt update || true
> > - name: Install packages
> > run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > - libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > + libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> > libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-pip
> > python3-pyelftools python3-setuptools python3-wheel
> > zlib1g-dev
> > + - name: Install ipsec-mb library
> > + run: |
> > + wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> mb-dev_1.4-3_amd64.deb"
> > + wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> mb1_1.4-3_amd64.deb"
> > + sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > + sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
>
> I am not enthousiastic at advertising a kind of out of tree approach.
> That's a bit like if NVIDIA asked us to stop testing distribution rdma-core
> packages and instead rely on MOFED.
>
> Why are we removing support for versions that are packaged by the main
> distributions?
With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the package manager.
We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from this release onwards,
removing the many ifdef codepaths in the PMDs for older versions. (patch included in this patchset)
Some of the other CI environments were updated to install v1.4 already to support this change,
but we found the github CI robot was limited for ipsec-mb versions when using the package manager.
It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in reference build, but not compiled after patch).
To support the new minimum SW PMD ipsec-mb version for this CI, we thought installing v1.4 like this would suffice.
Thanks,
Ciara
^ permalink raw reply [relevance 3%]
* [PATCH 1/3] ethdev: support setting lanes
@ 2024-03-12 7:52 5% ` Dengdui Huang
0 siblings, 0 replies; 200+ results
From: Dengdui Huang @ 2024-03-12 7:52 UTC (permalink / raw)
To: dev
Cc: ferruh.yigit, aman.deep.singh, yuying.zhang, thomas,
andrew.rybchenko, liuyonglong, fengchengwen, haijie1, lihuisong
Some speeds can be achieved with different number of lanes. For example,
100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps.
When use different lanes, the port cannot be up. This patch add support
for setting lanes and report lanes.
Signed-off-by: Dengdui Huang <huangdengdui@huawei.com>
---
doc/guides/rel_notes/release_24_03.rst | 8 +-
drivers/net/bnxt/bnxt_ethdev.c | 3 +-
drivers/net/hns3/hns3_ethdev.c | 1 +
lib/ethdev/ethdev_driver.h | 1 -
lib/ethdev/ethdev_linux_ethtool.c | 101 ++++++++-
lib/ethdev/ethdev_private.h | 4 +
lib/ethdev/ethdev_trace.h | 4 +-
lib/ethdev/meson.build | 2 +
lib/ethdev/rte_ethdev.c | 272 +++++++++++++++++++++++--
lib/ethdev/rte_ethdev.h | 99 +++++++--
lib/ethdev/version.map | 7 +
11 files changed, 466 insertions(+), 36 deletions(-)
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 932688ca4d..75d93ee965 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -76,6 +76,10 @@ New Features
* Added a fath path function ``rte_eth_tx_queue_count``
to get the number of used descriptors of a Tx queue.
+* **Support setting lanes for ethdev.**
+ * Support setting lanes by extended ``RTE_ETH_LINK_SPEED_*``.
+ * Added function to convert bitmap flag to the struct of link speed info.
+
* **Added hash calculation of an encapsulated packet as done by the HW.**
Added function to calculate hash when doing tunnel encapsulation:
@@ -240,9 +244,11 @@ ABI Changes
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
=======================================================
-
* No ABI change that would break compatibility with 23.11.
+* ethdev: Convert a numerical speed to a bitmap flag with lanes:
+ The function ``rte_eth_speed_bitflag`` add lanes parameters.
+
Known Issues
------------
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ba31ae9286..e881a7f3cc 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -711,7 +711,8 @@ static int bnxt_update_phy_setting(struct bnxt *bp)
}
/* convert to speedbit flag */
- curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed, 1);
+ curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed,
+ RTE_ETH_LANES_UNKNOWN, 1);
/*
* Device is not obliged link down in certain scenarios, even
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index b10d1216d2..ecd3b2ef64 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -5969,6 +5969,7 @@ hns3_get_speed_fec_capa(struct rte_eth_fec_capa *speed_fec_capa,
for (i = 0; i < RTE_DIM(speed_fec_capa_tbl); i++) {
speed_bit =
rte_eth_speed_bitflag(speed_fec_capa_tbl[i].speed,
+ RTE_ETH_LANES_UNKNOWN,
RTE_ETH_LINK_FULL_DUPLEX);
if ((speed_capa & speed_bit) == 0)
continue;
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0dbf2dd6a2..bb7dc7acb7 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -2003,7 +2003,6 @@ __rte_internal
int
rte_eth_ip_reassembly_dynfield_register(int *field_offset, int *flag);
-
/*
* Legacy ethdev API used internally by drivers.
*/
diff --git a/lib/ethdev/ethdev_linux_ethtool.c b/lib/ethdev/ethdev_linux_ethtool.c
index e792204b01..b776ec6173 100644
--- a/lib/ethdev/ethdev_linux_ethtool.c
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -111,10 +111,107 @@ static const uint32_t link_modes[] = {
[101] = 11, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
};
+/*
+ * Link modes sorted with index as defined in ethtool.
+ * Values are lanes.
+ */
+static const uint32_t link_modes_lanes[] = {
+ [0] = 1, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+ [1] = 1, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+ [2] = 1, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+ [3] = 1, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+ [4] = 1, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+ [5] = 1, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+ [12] = 1, /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
+ [15] = 1, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+ [17] = 1, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+ [18] = 4, /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
+ [19] = 1, /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
+ [20] = 1, /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
+ [21] = 2, /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
+ [22] = 2, /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
+ [23] = 4, /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
+ [24] = 4, /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
+ [25] = 4, /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
+ [26] = 4, /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
+ [27] = 4, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+ [28] = 4, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+ [29] = 4, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+ [30] = 4, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+ [31] = 1, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+ [32] = 1, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+ [33] = 1, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+ [34] = 2, /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
+ [35] = 2, /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
+ [36] = 4, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
+ [37] = 4, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
+ [38] = 4, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
+ [39] = 4, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
+ [40] = 2, /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
+ [41] = 1, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+ [42] = 1, /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
+ [43] = 1, /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
+ [44] = 1, /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
+ [45] = 1, /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
+ [46] = 1, /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
+ [47] = 1, /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
+ [48] = 1, /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
+ [52] = 1, /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
+ [53] = 1, /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
+ [54] = 1, /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
+ [55] = 1, /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
+ [56] = 1, /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
+ [57] = 2, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
+ [58] = 2, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
+ [59] = 2, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
+ [60] = 2, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
+ [61] = 2, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
+ [62] = 4, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
+ [63] = 4, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
+ [64] = 4, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
+ [65] = 4, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
+ [66] = 4, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
+ [67] = 1, /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
+ [68] = 1, /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
+ [69] = 8, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
+ [70] = 8, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
+ [71] = 8, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
+ [72] = 8, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
+ [73] = 8, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
+ [75] = 1, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
+ [76] = 1, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
+ [77] = 1, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
+ [78] = 1, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
+ [79] = 1, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
+ [80] = 2, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
+ [81] = 2, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
+ [82] = 2, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
+ [83] = 2, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
+ [84] = 2, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
+ [85] = 4, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
+ [86] = 4, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
+ [87] = 4, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
+ [88] = 4, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
+ [89] = 4, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
+ [90] = 1, /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
+ [91] = 1, /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
+ [92] = 1, /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
+ [93] = 8, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
+ [94] = 8, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
+ [95] = 8, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
+ [96] = 8, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
+ [97] = 8, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
+ [98] = 8, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
+ [99] = 1, /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
+ [100] = 1, /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
+ [101] = 1, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
+};
+
uint32_t
rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
{
uint32_t speed;
+ uint8_t lanes;
int duplex;
/* get mode from array */
@@ -131,7 +228,9 @@ rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
RTE_ETH_LINK_FULL_DUPLEX;
speed &= RTE_GENMASK32(31, 1);
- return rte_eth_speed_bitflag(speed, duplex);
+ lanes = link_modes_lanes[bit];
+
+ return rte_eth_speed_bitflag(speed, lanes, duplex);
}
uint32_t
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index 0d36b9c30f..9092ab3a9e 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -79,4 +79,8 @@ void eth_dev_txq_release(struct rte_eth_dev *dev, uint16_t qid);
int eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
int eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
+/* versioned functions */
+uint32_t rte_eth_speed_bitflag_v24(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex);
+
#endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..5547b49cab 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -183,8 +183,10 @@ RTE_TRACE_POINT(
RTE_TRACE_POINT(
rte_eth_trace_speed_bitflag,
- RTE_TRACE_POINT_ARGS(uint32_t speed, int duplex, uint32_t ret),
+ RTE_TRACE_POINT_ARGS(uint32_t speed, uint8_t lanes, int duplex,
+ uint32_t ret),
rte_trace_point_emit_u32(speed);
+ rte_trace_point_emit_u8(lanes);
rte_trace_point_emit_int(duplex);
rte_trace_point_emit_u32(ret);
)
diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index f1d2586591..2c9588d0b3 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -62,3 +62,5 @@ endif
if get_option('buildtype').contains('debug')
cflags += ['-DRTE_FLOW_DEBUG']
endif
+
+use_function_versioning = true
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..522f8796b1 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -26,6 +26,7 @@
#include <rte_class.h>
#include <rte_ether.h>
#include <rte_telemetry.h>
+#include <rte_function_versioning.h>
#include "rte_ethdev.h"
#include "rte_ethdev_trace_fp.h"
@@ -991,63 +992,111 @@ rte_eth_dev_tx_queue_stop(uint16_t port_id, uint16_t tx_queue_id)
return ret;
}
-uint32_t
-rte_eth_speed_bitflag(uint32_t speed, int duplex)
+uint32_t __vsym
+rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex)
{
- uint32_t ret;
+ uint32_t ret = 0;
switch (speed) {
case RTE_ETH_SPEED_NUM_10M:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
break;
case RTE_ETH_SPEED_NUM_100M:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
break;
case RTE_ETH_SPEED_NUM_1G:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = RTE_ETH_LINK_SPEED_1G;
break;
case RTE_ETH_SPEED_NUM_2_5G:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = RTE_ETH_LINK_SPEED_2_5G;
break;
case RTE_ETH_SPEED_NUM_5G:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = RTE_ETH_LINK_SPEED_5G;
break;
case RTE_ETH_SPEED_NUM_10G:
- ret = RTE_ETH_LINK_SPEED_10G;
+ if (lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_10G;
+ if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_10G_4LANES;
break;
case RTE_ETH_SPEED_NUM_20G:
- ret = RTE_ETH_LINK_SPEED_20G;
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_2)
+ break;
+ ret = RTE_ETH_LINK_SPEED_20G_2LANES;
break;
case RTE_ETH_SPEED_NUM_25G:
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+ break;
ret = RTE_ETH_LINK_SPEED_25G;
break;
case RTE_ETH_SPEED_NUM_40G:
- ret = RTE_ETH_LINK_SPEED_40G;
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_4)
+ break;
+ ret = RTE_ETH_LINK_SPEED_40G_4LANES;
break;
case RTE_ETH_SPEED_NUM_50G:
- ret = RTE_ETH_LINK_SPEED_50G;
+ if (lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_50G;
+ if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_50G_2LANES;
break;
case RTE_ETH_SPEED_NUM_56G:
- ret = RTE_ETH_LINK_SPEED_56G;
+ if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_4)
+ break;
+ ret = RTE_ETH_LINK_SPEED_56G_4LANES;
break;
case RTE_ETH_SPEED_NUM_100G:
- ret = RTE_ETH_LINK_SPEED_100G;
+ if (lanes == RTE_ETH_LANES_1)
+ ret = RTE_ETH_LINK_SPEED_100G;
+ if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_100G_2LANES;
+ if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_100G_4LANES;
break;
case RTE_ETH_SPEED_NUM_200G:
- ret = RTE_ETH_LINK_SPEED_200G;
+ if (lanes == RTE_ETH_LANES_2)
+ ret = RTE_ETH_LINK_SPEED_200G_2LANES;
+ if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_200G_4LANES;
break;
case RTE_ETH_SPEED_NUM_400G:
+ if (lanes == RTE_ETH_LANES_4)
+ ret = RTE_ETH_LINK_SPEED_400G_4LANES;
+ if (lanes == RTE_ETH_LANES_8)
+ ret = RTE_ETH_LINK_SPEED_400G_8LANES;
ret = RTE_ETH_LINK_SPEED_400G;
break;
default:
ret = 0;
}
- rte_eth_trace_speed_bitflag(speed, duplex, ret);
+ rte_eth_trace_speed_bitflag(speed, lanes, duplex, ret);
return ret;
}
+uint32_t __vsym
+rte_eth_speed_bitflag_v24(uint32_t speed, int duplex)
+{
+ return rte_eth_speed_bitflag_v25(speed, RTE_ETH_LANES_UNKNOWN, duplex);
+}
+
+/* mark the v24 function as the older version, and v25 as the default version */
+VERSION_SYMBOL(rte_eth_speed_bitflag, _v24, 24);
+BIND_DEFAULT_SYMBOL(rte_eth_speed_bitflag, _v25, 25);
+MAP_STATIC_SYMBOL(uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex),
+ rte_eth_speed_bitflag_v25);
+
const char *
rte_eth_dev_rx_offload_name(uint64_t offload)
{
@@ -1066,6 +1115,204 @@ rte_eth_dev_rx_offload_name(uint64_t offload)
return name;
}
+int
+rte_eth_speed_capa_to_info(uint32_t link_speed,
+ struct rte_eth_speed_capa_info *capa_info)
+{
+ const struct {
+ uint32_t speed_capa;
+ struct rte_eth_speed_capa_info capa_info;
+ } speed_capa_info_map[] = {
+ {
+ RTE_ETH_LINK_SPEED_10M_HD,
+ {
+ RTE_ETH_SPEED_NUM_10M,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_HALF_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_10M,
+ {
+ RTE_ETH_SPEED_NUM_10M,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_100M_HD,
+ {
+ RTE_ETH_SPEED_NUM_100M,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_HALF_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_100M,
+ {
+ RTE_ETH_SPEED_NUM_100M,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_1G,
+ {
+ RTE_ETH_SPEED_NUM_1G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_2_5G,
+ {
+ RTE_ETH_SPEED_NUM_2_5G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_5G,
+ {
+ RTE_ETH_SPEED_NUM_5G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_10G,
+ {
+ RTE_ETH_SPEED_NUM_10G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_20G_2LANES,
+ {
+ RTE_ETH_SPEED_NUM_20G,
+ RTE_ETH_LANES_2,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_25G,
+ {
+ RTE_ETH_SPEED_NUM_25G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_40G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_40G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_50G,
+ {
+ RTE_ETH_SPEED_NUM_50G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_56G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_56G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_100G,
+ {
+ RTE_ETH_SPEED_NUM_100G,
+ RTE_ETH_LANES_1,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_200G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_200G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_400G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_400G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_10G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_10G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_50G_2LANES,
+ {
+ RTE_ETH_SPEED_NUM_50G,
+ RTE_ETH_LANES_2,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_100G_2LANES,
+ {
+ RTE_ETH_SPEED_NUM_100G,
+ RTE_ETH_LANES_2,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_100G_4LANES,
+ {
+ RTE_ETH_SPEED_NUM_100G,
+ RTE_ETH_LANES_4,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_200G_2LANES,
+ {
+ RTE_ETH_SPEED_NUM_200G,
+ RTE_ETH_LANES_2,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ },
+ {
+ RTE_ETH_LINK_SPEED_400G_8LANES,
+ {
+ RTE_ETH_SPEED_NUM_400G,
+ RTE_ETH_LANES_8,
+ RTE_ETH_LINK_FULL_DUPLEX
+ }
+ }
+ };
+ uint32_t i;
+
+ for (i = 0; i < RTE_DIM(speed_capa_info_map); i++)
+ if (link_speed == speed_capa_info_map[i].speed_capa) {
+ capa_info->speed = speed_capa_info_map[i].capa_info.speed;
+ capa_info->lanes = speed_capa_info_map[i].capa_info.lanes;
+ capa_info->duplex = speed_capa_info_map[i].capa_info.duplex;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
const char *
rte_eth_dev_tx_offload_name(uint64_t offload)
{
@@ -3111,8 +3358,9 @@ rte_eth_link_to_str(char *str, size_t len, const struct rte_eth_link *eth_link)
if (eth_link->link_status == RTE_ETH_LINK_DOWN)
ret = snprintf(str, len, "Link down");
else
- ret = snprintf(str, len, "Link up at %s %s %s",
+ ret = snprintf(str, len, "Link up at %s %ulanes %s %s",
rte_eth_link_speed_to_str(eth_link->link_speed),
+ eth_link->link_lanes,
(eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
"FDX" : "HDX",
(eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 147257d6a2..3383ad8495 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -288,24 +288,40 @@ struct rte_eth_stats {
/**@{@name Link speed capabilities
* Device supported speeds bitmap flags
*/
-#define RTE_ETH_LINK_SPEED_AUTONEG 0 /**< Autonegotiate (all speeds) */
-#define RTE_ETH_LINK_SPEED_FIXED RTE_BIT32(0) /**< Disable autoneg (fixed speed) */
-#define RTE_ETH_LINK_SPEED_10M_HD RTE_BIT32(1) /**< 10 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_10M RTE_BIT32(2) /**< 10 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3) /**< 100 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_100M RTE_BIT32(4) /**< 100 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_1G RTE_BIT32(5) /**< 1 Gbps */
-#define RTE_ETH_LINK_SPEED_2_5G RTE_BIT32(6) /**< 2.5 Gbps */
-#define RTE_ETH_LINK_SPEED_5G RTE_BIT32(7) /**< 5 Gbps */
-#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */
-#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9) /**< 20 Gbps */
-#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
-#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
-#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
-#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
-#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */
-#define RTE_ETH_LINK_SPEED_200G RTE_BIT32(15) /**< 200 Gbps */
-#define RTE_ETH_LINK_SPEED_400G RTE_BIT32(16) /**< 400 Gbps */
+#define RTE_ETH_LINK_SPEED_AUTONEG 0 /**< Autonegotiate (all speeds) */
+#define RTE_ETH_LINK_SPEED_FIXED RTE_BIT32(0) /**< Disable autoneg (fixed speed) */
+#define RTE_ETH_LINK_SPEED_10M_HD RTE_BIT32(1) /**< 10 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_10M RTE_BIT32(2) /**< 10 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3) /**< 100 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_100M RTE_BIT32(4) /**< 100 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_1G RTE_BIT32(5) /**< 1 Gbps */
+#define RTE_ETH_LINK_SPEED_2_5G RTE_BIT32(6) /**< 2.5 Gbps */
+#define RTE_ETH_LINK_SPEED_5G RTE_BIT32(7) /**< 5 Gbps */
+#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8) /**< 10 Gbps */
+#define RTE_ETH_LINK_SPEED_20G_2LANES RTE_BIT32(9) /**< 20 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
+#define RTE_ETH_LINK_SPEED_40G_4LANES RTE_BIT32(11) /**< 40 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
+#define RTE_ETH_LINK_SPEED_56G_4LANES RTE_BIT32(13) /**< 56 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_100G RTE_BIT32(14) /**< 100 Gbps */
+#define RTE_ETH_LINK_SPEED_200G_4LANES RTE_BIT32(15) /**< 200 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_400G_4LANES RTE_BIT32(16) /**< 400 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17) /**< 10 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_2LANES RTE_BIT32(19) /**< 100 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_4LANES RTE_BIT32(20) /**< 100 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_200G_2LANES RTE_BIT32(21) /**< 200 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_400G_8LANES RTE_BIT32(22) /**< 400 Gbps 8lanes */
+/**@}*/
+
+/**@{@name Link speed capabilities
+ * Default lanes, use to compatible with earlier versions
+ */
+#define RTE_ETH_LINK_SPEED_20G RTE_ETH_LINK_SPEED_20G_2LANES
+#define RTE_ETH_LINK_SPEED_40G RTE_ETH_LINK_SPEED_40G_4LANES
+#define RTE_ETH_LINK_SPEED_56G RTE_ETH_LINK_SPEED_56G_4LANES
+#define RTE_ETH_LINK_SPEED_200G RTE_ETH_LINK_SPEED_200G_4LANES
+#define RTE_ETH_LINK_SPEED_400G RTE_ETH_LINK_SPEED_400G_4LANES
/**@}*/
/**@{@name Link speed
@@ -329,6 +345,25 @@ struct rte_eth_stats {
#define RTE_ETH_SPEED_NUM_UNKNOWN UINT32_MAX /**< Unknown */
/**@}*/
+/**@{@name Link lane number
+ * Ethernet lane number
+ */
+#define RTE_ETH_LANES_UNKNOWN 0 /**< Unknown */
+#define RTE_ETH_LANES_1 1 /**< 1 lanes */
+#define RTE_ETH_LANES_2 2 /**< 2 lanes */
+#define RTE_ETH_LANES_4 4 /**< 4 lanes */
+#define RTE_ETH_LANES_8 8 /**< 8 lanes */
+/**@}*/
+
+/**
+ * A structure used to store information of link speed capability.
+ */
+struct rte_eth_speed_capa_info {
+ uint32_t speed; /**< RTE_ETH_SPEED_NUM_ */
+ uint8_t lanes; /**< RTE_ETH_LANES_ */
+ uint8_t duplex; /**< RTE_ETH_LANES_ */
+};
+
/**
* A structure used to retrieve link-level information of an Ethernet port.
*/
@@ -338,6 +373,7 @@ struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
uint16_t link_duplex : 1; /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
uint16_t link_autoneg : 1; /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
uint16_t link_status : 1; /**< RTE_ETH_LINK_[DOWN/UP] */
+ uint16_t link_lanes : 8; /**< RTE_ETH_LANES_ */
};
/**@{@name Link negotiation
@@ -1641,6 +1677,13 @@ struct rte_eth_conf {
#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3)
/** Device supports keeping shared flow objects across restart. */
#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**
+ * Device supports setting lanes. When the driver does not support setting lane,
+ * the lane in the speed capability reported by the driver may be incorrect,
+ * for example, if the driver reports that the 200G speed capability
+ * (@see RTE_ETH_LINK_SPEED_200G), the number of used lanes may be 2 or 4.
+ */
+#define RTE_ETH_DEV_CAPA_SETTING_LANES RTE_BIT64(4)
/**@}*/
/*
@@ -2301,12 +2344,30 @@ uint16_t rte_eth_dev_count_total(void);
*
* @param speed
* Numerical speed value in Mbps
+ * @param lanes
+ * RTE_ETH_LANES_x
* @param duplex
* RTE_ETH_LINK_[HALF/FULL]_DUPLEX (only for 10/100M speeds)
* @return
* 0 if the speed cannot be mapped
*/
-uint32_t rte_eth_speed_bitflag(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Convert bitmap flag to the struct of link speed capability info
+ *
+ * @param link_speed
+ * speed bitmap (RTE_ETH_LINK_SPEED_)
+ * @return
+ * - (0) if successful.
+ * - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_speed_capa_to_info(uint32_t link_speed,
+ struct rte_eth_speed_capa_info *capa_info);
/**
* Get RTE_ETH_RX_OFFLOAD_* flag name.
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..0e9c560920 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -169,6 +169,12 @@ DPDK_24 {
local: *;
};
+DPDK_25 {
+ global:
+
+ rte_eth_speed_bitflag;
+} DPDK_24;
+
EXPERIMENTAL {
global:
@@ -320,6 +326,7 @@ EXPERIMENTAL {
# added in 24.03
__rte_eth_trace_tx_queue_count;
rte_eth_find_rss_algo;
+ rte_eth_speed_capa_to_info;
rte_flow_async_update_resized;
rte_flow_calc_encap_hash;
rte_flow_template_table_resizable;
--
2.33.0
^ permalink raw reply [relevance 5%]
* Community CI Meeting Minutes - March 7, 2024
@ 2024-03-11 22:36 2% Patrick Robb
0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-03-11 22:36 UTC (permalink / raw)
To: ci; +Cc: dev, dts
Sorry, I forgot to send these last week.
March 7, 2024
#####################################################################
Attendees
1. Patrick Robb
2. Ali Alnubani
3. Paul Szczepanek
4. David Marchand
5. Aaron Conole
#####################################################################
Minutes
=====================================================================
General Announcements
* IPSEC-MB requirement increase:
* Aaron has some questions about whether this new requirement has
been properly documented - having a conversation with Ciara to that
end on the mailing list currently
* Arm did publish an updated tag for this repo - Ciara has some
ideas for what may be going wrong and started a conversation on the
mailing list
* Patrick Robbwill forward this conversation to Paul
* Building under OpenSSL is still supported
* Server Refresh:
* See the mailing list for the most recent ideas, but we will be
putting various options in front of GB in the March meeting
* Idea is to try to support as many arches as possible (intel x86,
amd x86, arm grace-grace)
=====================================================================
CI Status
---------------------------------------------------------------------
UNH-IOL Community Lab
* Hardware Refresh:
* NVIDIA CX7:
* Without writing out the whole background for the cx7 testing
on our NVIDIA DUT, we are being bandwidth capped by the server with
this performance testing, but this can be worked around by acquiring a
2nd CX7 NIC for the DUT server.
* For on thing, this corresponds to the testing NVIDIA
publishes: https://fast.dpdk.org/doc/perf/DPDK_23_07_NVIDIA_NIC_performance_report.pdf
* Patrick has asked whether NVIDIA can donate this NIC. We
can also go to the DPDK project asking for it, but they have already
provided two cx7 to the Community Lab, so it is not ideal.
* Over email we have noted that the Broadwell CPU is old and
may not be adequate for higher bandwidth testing
* QAT 8970 on Amper Server: Has been dry run and is working
* Requires a few change in DTS which Patrick can submit once
David/Dharmik give approval (basically relates to loading vfio with
custom options for certain QAT devices only)
* If there are no objections, UNH folks can write up the
automation scripts today and get the testing online today or next
week.
* Test Coverage changes:
* OpenSSL driver test has been added to our unit testing jobs
* Marvell mvneta build test has been added, per:
https://doc.dpdk.org/guides/nics/mvneta.html
* Debian 12 has been added to the CI template engine, and we’re
running testing from this now
* Need to upstream this.
* Robin Jarry noted on Slack UNH has been sending out results to
test-reports mailing list without setting in-reply-to message-id for
the patchseries. Adam has resolved this.
* Ferruh also notes that in looking at this he noticed duplicate
emails being sent by UNH, which we still need to resolve
* Cody at UNH has been making updates to testing on Windows:
* Did modify the 2022 build test this week, moving it from the MSVC
preview compiler to the MSVC standard compiler (which with v17.9.2 has
now caught up to the build features previously only available in the
preview version)
* Cody is also adding the Clang and Mingw64 compile jobs to the
2022 server (they are only on server 2019 right now) and also is
adding DPDK unit test/fast tests to the 2022 server.
* David Marchand noticed a bug with the create artifact script: After
failing to apply on the recommended branch and trying to fall back on
applying to main, it did not checkout to tip of main. Patrick will
look.
* Bugzilla ticket was creating noting that we need to add
23.11-staging to our CI
---------------------------------------------------------------------
Intel Lab
* None
---------------------------------------------------------------------
Github Actions
* Has to double check the ipsec-mb requirement and how we generate abi
symbols. Need to check that they are pulling the right version.
* In progress in migrating back to the original server this ran on
before the server was physically moved to another location
* Going to completely re-image/update the server
* Posted a series for adding Cirrus-CI to the robot monitoring
* Comments are welcome on the mailin list
* Need to add a Cirrus YAML config for the DPDK repo
---------------------------------------------------------------------
Loongarch Lab
* Zhoumin has stated on the mailing list that he can support the email
based retest framework
* Possible to store commit hash when series as submitted, and
recreate those artifacts as needed
* Also can support re-apply on tip of branch X
* There is an ongoing conversation on CI mailing list for this
=====================================================================
DTS Improvements & Test Development
24.07 Roadmap
* Ethdev testsuites:
* Nicholas:
* Jumboframes:
https://git.dpdk.org/tools/dts/tree/test_plans/jumboframes_test_plan.rst
* Mac Filter:
https://git.dpdk.org/tools/dts/tree/test_plans/mac_filter_test_plan.rst
* Prince:
* Dynamic Queue:
https://git.dpdk.org/tools/dts/tree/test_plans/dynamic_queue_test_plan.rst
* Need to vet the testsuites. It may be possible to add additional
testcases, refactor testcases. We want to flex the same capabilities
as the old testsuites, but make improvements where possible.
* We should loop in ethdev maintainers and ask for their review
on the testcases
* David test an email a couple years ago which priority ranked
some ethdev capabilities and testsuites, and if we can find this email
we should use it.
* https://inbox.dpdk.org/ci/CAJFAV8y8-LSh5vniZXR812ckKNa2ELEJVRKRzT53PVu2zO902w@mail.gmail.com/
* Configuration schema updates:
* Nicholas:
* Working on the Hugepages allocation first, then will do the
other config updates (ripping out some unneeded keys from the schema)
* Will follow up with the ethdev testsuites
* API Docs generation:
* Juraj: Needs review from Thomas (the Doxygen integration part),
may need to be addressed when Juraj gets back from vacation.
* Skip test cases based on testbed capabilities:
* Juraj: RFC should be ready before Juraj leaves on vacation. 24.07
shouldn't be a problem.
* RFC Patch:
https://patches.dpdk.org/project/dpdk/patch/20240301155416.96960-1-juraj.linkes@pantheon.tech/
* The patch requires
https://patches.dpdk.org/project/dpdk/list/?series=31329
* Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1351
* Rename the execution section/stage:
* Juraj: Juraj will work on this in 24.07 and submit a patch to
continue the discussion. The v1 patch will be ready for 24.07, but the
discussion/review could push the patch to 24.11.
* Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1355
* Add support for externally compiled DPDK:
* Juraj: Juraj will start working on this in 24.07. There's a small
chance we'll get this in 24.07, but Juraj wants to target this for
24.11.
* Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1365
* Jeremy has a bugzilla ticket for refactoring how we handle scapy on
the TG (no more XMLRPC server), and will do this in 24.07
* We will finalize at next DTS meeting
=====================================================================
Any other business
* Next Meeting: March 21, 2024
^ permalink raw reply [relevance 2%]
* RE: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
2024-03-08 17:24 3% ` Ferruh Yigit
@ 2024-03-10 14:29 0% ` Brandes, Shai
2024-03-13 11:21 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Brandes, Shai @ 2024-03-10 14:29 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, March 8, 2024 7:24 PM
> To: Brandes, Shai <shaibran@amazon.com>
> Cc: dev@dpdk.org
> Subject: RE: [EXTERNAL] [PATCH v3 07/33] net/ena: restructure the llq policy
> setting process
>
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
>
>
>
> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> > From: Shai Brandes <shaibran@amazon.com>
> >
> > The driver will set the size of the LLQ header size according to the
> > recommendation from the device.
> > Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg
> > `llq_policy` that accepts the following values:
> > 0 - Disable LLQ.
> > Use with extreme caution as it leads to a huge performance
> > degradation on AWS instances from 6th generation onwards.
> > 1 - Accept device recommended LLQ policy (Default).
> > Device can recommend normal or large LLQ policy.
> > 2 - Enforce normal LLQ policy.
> > 3 - Enforce large LLQ policy.
> > Required for packets with header that exceed 96 bytes on
> > AWS instances prior to 5th generation.
> >
>
> We had similar discussion before, although dev_args is not part of the ABI, it
> is an user interface, and changes in the devargs will impact users directly.
>
> What would you think to either keep backward compatilibity in the devargs
> (like not remove old one but add new one), or do this change in
> 24.11 release?
[Brandes, Shai] understood.
The new devarg replaced the old ones and added option to enforce normal-llq mode which is critical for our release.
As you suggested, we will keep backward compatibility and add an additional devarg for enforcing normal-llq policy.
That way, we can easily replace it in future releases with a common devarg without the need to make major logic changes.
^ permalink raw reply [relevance 0%]
* [RFC v3] tap: do not duplicate fd's
2024-03-08 18:54 2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
2024-03-08 20:36 8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
@ 2024-03-09 18:12 2% ` Stephen Hemminger
2024-03-14 14:40 4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-09 18:12 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
The TAP devic can use the same file descriptor for both rx and tx queues.
This allows up to 8 queues (versus 4) and reduces some resource consumption.
Also, reduce the TAP_MAX_QUEUES to what the multi-process restrictions
will allow.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v3 - This is more limited patch, only addresses tap device
and only gets tap up from 4 to 8 queues.
Still better to fix underlying EAL issue, but that requires
overriding strict ABI rules.
drivers/net/tap/meson.build | 2 +-
drivers/net/tap/rte_eth_tap.c | 197 +++++++++++++++-------------------
drivers/net/tap/rte_eth_tap.h | 3 +-
drivers/net/tap/tap_flow.c | 3 +-
drivers/net/tap/tap_intr.c | 12 ++-
5 files changed, 95 insertions(+), 122 deletions(-)
diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 5099ccdff11b..9cd124d53e23 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -16,7 +16,7 @@ sources = files(
deps = ['bus_vdev', 'gso', 'hash']
-cflags += '-DTAP_MAX_QUEUES=16'
+cflags += '-DTAP_MAX_QUEUES=8'
# input array for meson symbol search:
# [ "MACRO to define if found", "header for the search",
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695bed..38a1b2d825f9 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
- int rxq_count;
- int txq_count;
+ int q_count;
/*
* The file descriptors are in the dedicated part
* of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint16_t data_off = rte_pktmbuf_headroom(mbuf);
int len;
- len = readv(process_private->rxq_fds[rxq->queue_id],
+ len = readv(process_private->fds[rxq->queue_id],
*rxq->iovecs,
1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
}
/* copy the tx frame data */
- n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+ n = writev(process_private->fds[txq->queue_id], iovecs, k);
if (n <= 0)
return -1;
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
struct rte_mp_msg msg;
struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
int err;
- int fd_iterator = 0;
struct pmd_process_private *process_private = dev->process_private;
int i;
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
msg.len_param = sizeof(*request_param);
- for (i = 0; i < dev->data->nb_tx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->txq_fds[i];
- msg.num_fds++;
- request_param->txq_count++;
- }
- for (i = 0; i < dev->data->nb_rx_queues; i++) {
- msg.fds[fd_iterator++] = process_private->rxq_fds[i];
- msg.num_fds++;
- request_param->rxq_count++;
- }
+
+ /* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+ for (i = 0; i < dev->data->nb_rx_queues; i++)
+ msg.fds[i] = process_private->fds[i];
+
+ request_param->q_count = dev->data->nb_rx_queues;
+ msg.num_fds = dev->data->nb_rx_queues;
err = rte_mp_sendmsg(&msg);
if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
struct rte_eth_dev *dev;
const struct ipc_queues *request_param =
(const struct ipc_queues *)request->param;
- int fd_iterator;
- int queue;
struct pmd_process_private *process_private;
dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
request_param->port_name);
return -1;
}
+
process_private = dev->process_private;
- fd_iterator = 0;
- TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
- request_param->txq_count);
- for (queue = 0; queue < request_param->txq_count; queue++)
- process_private->txq_fds[queue] = request->fds[fd_iterator++];
- for (queue = 0; queue < request_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+ TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+ for (int q = 0; q < request_param->q_count; q++)
+ process_private->fds[q] = request->fds[q];
+
return 0;
}
@@ -1121,7 +1113,6 @@ tap_dev_close(struct rte_eth_dev *dev)
int i;
struct pmd_internals *internals = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
- struct rx_queue *rxq;
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
rte_free(dev->process_private);
@@ -1141,19 +1132,18 @@ tap_dev_close(struct rte_eth_dev *dev)
}
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- if (process_private->rxq_fds[i] != -1) {
- rxq = &internals->rxq[i];
- close(process_private->rxq_fds[i]);
- process_private->rxq_fds[i] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
- if (process_private->txq_fds[i] != -1) {
- close(process_private->txq_fds[i]);
- process_private->txq_fds[i] = -1;
- }
+ struct rx_queue *rxq = &internals->rxq[i];
+
+ if (process_private->fds[i] == -1)
+ continue;
+
+ close(process_private->fds[i]);
+ process_private->fds[i] = -1;
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
}
if (internals->remote_if_index) {
@@ -1198,6 +1188,15 @@ tap_dev_close(struct rte_eth_dev *dev)
return 0;
}
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+ if (process_private->fds[qid] != -1) {
+ close(process_private->fds[qid]);
+ process_private->fds[qid] = -1;
+ }
+}
+
static void
tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
{
@@ -1206,15 +1205,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!rxq)
return;
+
process_private = rte_eth_devices[rxq->in_port].process_private;
- if (process_private->rxq_fds[rxq->queue_id] != -1) {
- close(process_private->rxq_fds[rxq->queue_id]);
- process_private->rxq_fds[rxq->queue_id] = -1;
- tap_rxq_pool_free(rxq->pool);
- rte_free(rxq->iovecs);
- rxq->pool = NULL;
- rxq->iovecs = NULL;
- }
+
+ tap_rxq_pool_free(rxq->pool);
+ rte_free(rxq->iovecs);
+ rxq->pool = NULL;
+ rxq->iovecs = NULL;
+
+ if (dev->data->tx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static void
@@ -1225,12 +1225,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
if (!txq)
return;
- process_private = rte_eth_devices[txq->out_port].process_private;
- if (process_private->txq_fds[txq->queue_id] != -1) {
- close(process_private->txq_fds[txq->queue_id]);
- process_private->txq_fds[txq->queue_id] = -1;
- }
+ process_private = rte_eth_devices[txq->out_port].process_private;
+ if (dev->data->rx_queues[qid] == NULL)
+ tap_queue_close(process_private, qid);
}
static int
@@ -1482,52 +1480,34 @@ tap_setup_queue(struct rte_eth_dev *dev,
uint16_t qid,
int is_rx)
{
- int ret;
- int *fd;
- int *other_fd;
- const char *dir;
+ int fd, ret;
struct pmd_internals *pmd = dev->data->dev_private;
struct pmd_process_private *process_private = dev->process_private;
struct rx_queue *rx = &internals->rxq[qid];
struct tx_queue *tx = &internals->txq[qid];
- struct rte_gso_ctx *gso_ctx;
+ struct rte_gso_ctx *gso_ctx = NULL;
+ const char *dir = is_rx ? "rx" : "tx";
- if (is_rx) {
- fd = &process_private->rxq_fds[qid];
- other_fd = &process_private->txq_fds[qid];
- dir = "rx";
- gso_ctx = NULL;
- } else {
- fd = &process_private->txq_fds[qid];
- other_fd = &process_private->rxq_fds[qid];
- dir = "tx";
+ if (is_rx)
gso_ctx = &tx->gso_ctx;
- }
- if (*fd != -1) {
+
+ fd = process_private->fds[qid];
+ if (fd != -1) {
/* fd for this queue already exists */
TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
- pmd->name, *fd, dir, qid);
+ pmd->name, fd, dir, qid);
gso_ctx = NULL;
- } else if (*other_fd != -1) {
- /* Only other_fd exists. dup it */
- *fd = dup(*other_fd);
- if (*fd < 0) {
- *fd = -1;
- TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
- return -1;
- }
- TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
- pmd->name, *other_fd, dir, qid, *fd);
} else {
- /* Both RX and TX fds do not exist (equal -1). Create fd */
- *fd = tun_alloc(pmd, 0, 0);
- if (*fd < 0) {
- *fd = -1; /* restore original value */
+ fd = tun_alloc(pmd, 0, 0);
+ if (fd < 0) {
TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
return -1;
}
+
TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
- pmd->name, dir, qid, *fd);
+ pmd->name, dir, qid, fd);
+
+ process_private->fds[qid] = fd;
}
tx->mtu = &dev->data->mtu;
@@ -1540,7 +1520,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
tx->type = pmd->type;
- return *fd;
+ return fd;
}
static int
@@ -1620,7 +1600,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG, " RX TUNTAP device name %s, qid %d on fd %d",
internals->name, rx_queue_id,
- process_private->rxq_fds[rx_queue_id]);
+ process_private->fds[rx_queue_id]);
return 0;
@@ -1664,7 +1644,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
TAP_LOG(DEBUG,
" TX TUNTAP device name %s, qid %d on fd %d csum %s",
internals->name, tx_queue_id,
- process_private->txq_fds[tx_queue_id],
+ process_private->fds[tx_queue_id],
txq->csum ? "on" : "off");
return 0;
@@ -2001,10 +1981,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
dev->intr_handle = pmd->intr_handle;
/* Presetup the fds to -1 as being not valid */
- for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
- process_private->rxq_fds[i] = -1;
- process_private->txq_fds[i] = -1;
- }
+ for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+ process_private->fds[i] = -1;
+
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2311,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
struct ipc_queues *request_param = (struct ipc_queues *)request.param;
struct ipc_queues *reply_param;
struct pmd_process_private *process_private = dev->process_private;
- int queue, fd_iterator;
/* Prepare the request */
memset(&request, 0, sizeof(request));
@@ -2352,18 +2330,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
/* Attach the queues from received file descriptors */
- if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+ if (reply_param->q_count != reply->num_fds) {
TAP_LOG(ERR, "Unexpected number of fds received");
return -1;
}
- dev->data->nb_rx_queues = reply_param->rxq_count;
- dev->data->nb_tx_queues = reply_param->txq_count;
- fd_iterator = 0;
- for (queue = 0; queue < reply_param->rxq_count; queue++)
- process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
- for (queue = 0; queue < reply_param->txq_count; queue++)
- process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+ dev->data->nb_rx_queues = reply_param->q_count;
+ dev->data->nb_tx_queues = reply_param->q_count;
+
+ for (int q = 0; q < reply_param->q_count; q++)
+ process_private->fds[q] = reply->fds[q];
+
free(reply);
return 0;
}
@@ -2393,25 +2370,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
/* Fill file descriptors for all queues */
reply.num_fds = 0;
- reply_param->rxq_count = 0;
- if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
- RTE_MP_MAX_FD_NUM){
- TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+ reply_param->q_count = 0;
+
+ RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+ if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+ TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+ dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
return -1;
}
for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
- reply_param->rxq_count++;
- }
- RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
- reply_param->txq_count = 0;
- for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
- reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
- reply_param->txq_count++;
+ reply.fds[reply.num_fds++] = process_private->fds[queue];
+ reply_param->q_count++;
}
- RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
/* Send reply */
strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e961..dc8201020b5f 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
};
struct pmd_process_private {
- int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
- int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+ int fds[RTE_PMD_TAP_MAX_QUEUES];
};
/* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index fa50fe45d7b7..a78fd50cd494 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
* If netdevice is there, setup appropriate flow rules immediately.
* Otherwise it will be set when bringing up the netdevice (tun_alloc).
*/
- if (!process_private->rxq_fds[0])
+ if (process_private->fds[0] == -1)
return 0;
+
if (set) {
struct rte_flow *remote_flow;
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a32..bc953791635e 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,20 +68,22 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
}
for (i = 0; i < n; i++) {
struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+ int fd = process_private->fds[i];
/* Skip queues that cannot request interrupts. */
- if (!rxq || process_private->rxq_fds[i] == -1) {
+ if (!rxq || fd == -1) {
/* Use invalid intr_vec[] index to disable entry. */
if (rte_intr_vec_list_index_set(intr_handle, i,
- RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
+ RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
return -rte_errno;
continue;
}
+
if (rte_intr_vec_list_index_set(intr_handle, i,
- RTE_INTR_VEC_RXTX_OFFSET + count))
+ RTE_INTR_VEC_RXTX_OFFSET + count))
return -rte_errno;
- if (rte_intr_efds_index_set(intr_handle, count,
- process_private->rxq_fds[i]))
+
+ if (rte_intr_efds_index_set(intr_handle, count, fd))
return -rte_errno;
count++;
}
--
2.43.0
^ permalink raw reply [relevance 2%]
* [RFC v2] eal: increase passed max multi-process file descriptors
2024-03-08 18:54 2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
@ 2024-03-08 20:36 8% ` Stephen Hemminger
2024-03-14 15:23 0% ` Stephen Hemminger
2024-03-09 18:12 2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
2024-03-14 14:40 4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-08 20:36 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Anatoly Burakov, Jianfeng Tan
Both XDP and TAP device are limited in the number of queues
because of limitations on the number of file descriptors that
are allowed. The original choice of 8 was too low; the allowed
maximum is 253 according to unix(7) man page.
This may look like a serious ABI breakage but it is not.
It is simpler for everyone if the limit is increased rather than
building a parallel set of calls.
The case that matters is older application registering MP support
with the newer version of EAL. In this case, since the old application
will always send the more compact structure (less possible fd's)
it is OK.
Request (for up to 8 fds) sent to EAL.
- EAL only references up to num_fds.
- The area past the old fd array is not accessed.
Reply callback:
- EAL will pass pointer to the new (larger structure),
the old callback will only look at the first part of
the fd array (num_fds <= 8).
- Since primary and secondary must both be from same DPDK version
there is normal way that a reply with more fd's could be possible.
The only case is the same as above, where application requested
something that would break in old version and now succeeds.
The one possible incompatibility is that if application passed
a larger number of fd's (32?) and expected an error. Now it will
succeed and get passed through.
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 - show the simpler way to address with some minor ABI issue
doc/guides/rel_notes/release_24_03.rst | 4 ++++
lib/eal/include/rte_eal.h | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 932688ca4d82..1d33cfa15dfb 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -225,6 +225,10 @@ API Changes
* ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
``rte_flow_field_data`` for more generic usage.
+* eal: The maximum number of file descriptors allowed to be passed in
+ multi-process requests is increased from 8 to the maximum possible on
+ Linux unix domain sockets 253. This allows for more queues on XDP and
+ TAP device.
ABI Changes
-----------
diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
index c2256f832e51..cd84fcdd1bdb 100644
--- a/lib/eal/include/rte_eal.h
+++ b/lib/eal/include/rte_eal.h
@@ -155,7 +155,7 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
*/
bool rte_mp_disable(void);
-#define RTE_MP_MAX_FD_NUM 8 /* The max amount of fds */
+#define RTE_MP_MAX_FD_NUM 253 /* The max amount of fds */
#define RTE_MP_MAX_NAME_LEN 64 /* The max length of action name */
#define RTE_MP_MAX_PARAM_LEN 256 /* The max length of param */
struct rte_mp_msg {
--
2.43.0
^ permalink raw reply [relevance 8%]
* [RFC] eal: increase the number of availble file descriptors for MP
@ 2024-03-08 18:54 2% Stephen Hemminger
2024-03-08 20:36 8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Stephen Hemminger @ 2024-03-08 18:54 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Anatoly Burakov
The current limit of file descriptors is too low, it should have
been set to the maximum possible to send across an unix domain
socket.
This is an attempt to allow increasing it without breaking ABI.
But in the process it exposes what is broken about how symbol
versions are checked in check-symbol-maps.sh. That script is
broken in that it won't allow adding a backwards compatiable
version hook like this.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/eal/common/eal_common_proc.c | 118 ++++++++++++++++++++++++++-----
lib/eal/common/meson.build | 2 +
lib/eal/include/rte_eal.h | 13 +++-
lib/eal/version.map | 9 +++
4 files changed, 125 insertions(+), 17 deletions(-)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index d24093937c1d..c08113a8d9e0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -27,6 +27,7 @@
#include <rte_lcore.h>
#include <rte_log.h>
#include <rte_thread.h>
+#include <rte_function_versioning.h>
#include "eal_memcfg.h"
#include "eal_private.h"
@@ -796,7 +797,7 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
}
static int
-check_input(const struct rte_mp_msg *msg)
+check_input(const struct rte_mp_msg *msg, int max_fd)
{
if (msg == NULL) {
EAL_LOG(ERR, "Msg cannot be NULL");
@@ -825,9 +826,8 @@ check_input(const struct rte_mp_msg *msg)
return -1;
}
- if (msg->num_fds > RTE_MP_MAX_FD_NUM) {
- EAL_LOG(ERR, "Cannot send more than %d FDs",
- RTE_MP_MAX_FD_NUM);
+ if (msg->num_fds > max_fd) {
+ EAL_LOG(ERR, "Cannot send more than %d FDs", max_fd);
rte_errno = E2BIG;
return -1;
}
@@ -835,13 +835,13 @@ check_input(const struct rte_mp_msg *msg)
return 0;
}
-int
-rte_mp_sendmsg(struct rte_mp_msg *msg)
+static int
+mp_sendmsg(struct rte_mp_msg *msg, int max_fd)
{
const struct internal_config *internal_conf =
eal_get_internal_configuration();
- if (check_input(msg) != 0)
+ if (check_input(msg, max_fd) != 0)
return -1;
if (internal_conf->no_shconf) {
@@ -854,6 +854,24 @@ rte_mp_sendmsg(struct rte_mp_msg *msg)
return mp_send(msg, NULL, MP_MSG);
}
+int rte_mp_sendmsg_V23(struct rte_mp_old_msg *msg);
+int rte_mp_sendmsg_V24(struct rte_mp_msg *msg);
+
+int
+rte_mp_sendmsg_V23(struct rte_mp_old_msg *omsg)
+{
+ return mp_sendmsg((struct rte_mp_msg *)omsg, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_sendmsg, _V23, 23);
+
+int
+rte_mp_sendmsg_V24(struct rte_mp_msg *msg)
+{
+ return mp_sendmsg(msg, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_sendmsg, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_sendmsg(struct rte_mp_msg *msg), rte_mp_sendmsg_V24);
+
static int
mp_request_async(const char *dst, struct rte_mp_msg *req,
struct async_request_param *param, const struct timespec *ts)
@@ -988,9 +1006,9 @@ mp_request_sync(const char *dst, struct rte_mp_msg *req,
return 0;
}
-int
-rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
- const struct timespec *ts)
+static int
+__rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+ const struct timespec *ts, int max_fd)
{
int dir_fd, ret = -1;
DIR *mp_dir;
@@ -1005,7 +1023,7 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
reply->nb_received = 0;
reply->msgs = NULL;
- if (check_input(req) != 0)
+ if (check_input(req, max_fd) != 0)
goto end;
if (internal_conf->no_shconf) {
@@ -1085,9 +1103,34 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
return ret;
}
+int rte_mp_request_sync_V23(struct rte_mp_old_msg *req, struct rte_mp_reply *reply,
+ const struct timespec *ts);
+int rte_mp_request_sync_V24(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+ const struct timespec *ts);
+
+
+int
+rte_mp_request_sync_V23(struct rte_mp_old_msg *req, struct rte_mp_reply *reply,
+ const struct timespec *ts)
+{
+ return __rte_mp_request_sync((struct rte_mp_msg *)req, reply, ts, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_request_sync, _V23, 23);
+
int
-rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
- rte_mp_async_reply_t clb)
+rte_mp_request_sync_V24(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+ const struct timespec *ts)
+{
+ return __rte_mp_request_sync(req, reply, ts, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_request_sync, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_request_sync(struct rte_mp_msg *req, \
+ struct rte_mp_reply *reply, \
+ const struct timespec *ts), rte_mp_request_sync_V24);
+
+static int
+__rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
+ rte_mp_async_reply_t clb, int max_fd)
{
struct rte_mp_msg *copy;
struct pending_request *dummy;
@@ -1104,7 +1147,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
EAL_LOG(DEBUG, "request: %s", req->name);
- if (check_input(req) != 0)
+ if (check_input(req, max_fd) != 0)
return -1;
if (internal_conf->no_shconf) {
@@ -1237,14 +1280,38 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
return -1;
}
+int rte_mp_request_async_V23(struct rte_mp_old_msg *req, const struct timespec *ts,
+ rte_mp_async_reply_t clb);
+int rte_mp_request_async_V24(struct rte_mp_msg *req, const struct timespec *ts,
+ rte_mp_async_reply_t clb);
+
int
-rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
+rte_mp_request_async_V23(struct rte_mp_old_msg *req, const struct timespec *ts,
+ rte_mp_async_reply_t clb)
+{
+ return __rte_mp_request_async((struct rte_mp_msg *)req, ts, clb, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_request_async, _V23, 23);
+
+int
+rte_mp_request_async_V24(struct rte_mp_msg *req, const struct timespec *ts,
+ rte_mp_async_reply_t clb)
+{
+ return __rte_mp_request_async(req, ts, clb, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_request_async, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_request_async(struct rte_mp_msg *req, \
+ const struct timespec *ts, \
+ rte_mp_async_reply_t clb), rte_mp_request_async_V24);
+
+static int
+mp_reply(struct rte_mp_msg *msg, const char *peer, int max_fd)
{
EAL_LOG(DEBUG, "reply: %s", msg->name);
const struct internal_config *internal_conf =
eal_get_internal_configuration();
- if (check_input(msg) != 0)
+ if (check_input(msg, max_fd) != 0)
return -1;
if (peer == NULL) {
@@ -1261,6 +1328,25 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
return mp_send(msg, peer, MP_REP);
}
+int rte_mp_reply_V23(struct rte_mp_old_msg *msg, const char *peer);
+int rte_mp_reply_V24(struct rte_mp_msg *msg, const char *peer);
+
+int
+rte_mp_reply_V23(struct rte_mp_old_msg *msg, const char *peer)
+{
+ return mp_reply((struct rte_mp_msg *)msg, peer, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_reply, _V23, 23);
+
+int
+rte_mp_reply_V24(struct rte_mp_msg *msg, const char *peer)
+{
+ return mp_reply(msg, peer, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_reply, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_reply(struct rte_mp_msg *msg, const char *peer), rte_mp_reply_V24);
+
+
/* Internally, the status of the mp feature is represented as a three-state:
* - "unknown" as long as no secondary process attached to a primary process
* and there was no call to rte_mp_disable yet,
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 22a626ba6fc7..3faf0c20e798 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -3,6 +3,8 @@
includes += include_directories('.')
+use_function_versioning = true
+
cflags += [ '-DABI_VERSION="@0@"'.format(abi_version) ]
sources += files(
diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
index c2256f832e51..0d0761c50409 100644
--- a/lib/eal/include/rte_eal.h
+++ b/lib/eal/include/rte_eal.h
@@ -155,9 +155,10 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
*/
bool rte_mp_disable(void);
-#define RTE_MP_MAX_FD_NUM 8 /* The max amount of fds */
+#define RTE_MP_MAX_FD_NUM 253 /* The max number of fds (SCM_MAX_FD) */
#define RTE_MP_MAX_NAME_LEN 64 /* The max length of action name */
#define RTE_MP_MAX_PARAM_LEN 256 /* The max length of param */
+
struct rte_mp_msg {
char name[RTE_MP_MAX_NAME_LEN];
int len_param;
@@ -166,6 +167,16 @@ struct rte_mp_msg {
int fds[RTE_MP_MAX_FD_NUM];
};
+/* Legacy API version */
+#define RTE_MP_MAX_OLD_FD_NUM 8 /* The legacy limit on fds */
+struct rte_mp_old_msg {
+ char name[RTE_MP_MAX_NAME_LEN];
+ int len_param;
+ int num_fds;
+ uint8_t param[RTE_MP_MAX_PARAM_LEN];
+ int fds[RTE_MP_MAX_OLD_FD_NUM];
+};
+
struct rte_mp_reply {
int nb_sent;
int nb_received;
diff --git a/lib/eal/version.map b/lib/eal/version.map
index c06ceaad5097..264ff2d0818b 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -344,6 +344,15 @@ DPDK_24 {
local: *;
};
+DPDK_23 {
+ global:
+
+ rte_mp_reply;
+ rte_mp_request_async;
+ rte_mp_request_sync;
+ rte_mp_sendmsg;
+} DPDK_24;
+
EXPERIMENTAL {
global:
--
2.43.0
^ permalink raw reply [relevance 2%]
* Re: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
@ 2024-03-08 17:24 3% ` Ferruh Yigit
2024-03-10 14:29 0% ` Brandes, Shai
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-03-08 17:24 UTC (permalink / raw)
To: shaibran; +Cc: dev
On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
>
> The driver will set the size of the LLQ header size according to the
> recommendation from the device.
> Replaced `enable_llq` and `large_llq_hdr` devargs with
> a new devarg `llq_policy` that accepts the following values:
> 0 - Disable LLQ.
> Use with extreme caution as it leads to a huge performance
> degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
> Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
> Required for packets with header that exceed 96 bytes on
> AWS instances prior to 5th generation.
>
We had similar discussion before, although dev_args is not part of the
ABI, it is an user interface, and changes in the devargs will impact
users directly.
What would you think to either keep backward compatilibity in the
devargs (like not remove old one but add new one), or do this change in
24.11 release?
^ permalink raw reply [relevance 3%]
* Re: [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port
@ 2024-03-08 10:38 0% ` lihuisong (C)
2024-04-23 11:17 0% ` lihuisong (C)
1 sibling, 0 replies; 200+ results
From: lihuisong (C) @ 2024-03-08 10:38 UTC (permalink / raw)
To: thomas, ferruh.yigit
Cc: andrew.rybchenko, fengchengwen, liudongdong3, liuyonglong, dev
Hi Ferruh and Thomas,
Kindly ping for review.
在 2024/1/30 14:36, Huisong Li 写道:
> This patchset fix some bugs and support attaching and detaching port
> in primary and secondary.
>
> ---
> -v7: fix conflicts
> -v6: adjust rte_eth_dev_is_used position based on alphabetical order
> in version.map
> -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
> -v4: fix a misspelling.
> -v3:
> #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
> for other bus type.
> #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
> the probelm in patch 2/5.
> -v2: resend due to CI unexplained failure.
>
> Huisong Li (5):
> drivers/bus: restore driver assignment at front of probing
> ethdev: fix skip valid port in probing callback
> app/testpmd: check the validity of the port
> app/testpmd: add attach and detach port for multiple process
> app/testpmd: stop forwarding in new or destroy event
>
> app/test-pmd/testpmd.c | 47 +++++++++++++++---------
> app/test-pmd/testpmd.h | 1 -
> drivers/bus/auxiliary/auxiliary_common.c | 9 ++++-
> drivers/bus/dpaa/dpaa_bus.c | 9 ++++-
> drivers/bus/fslmc/fslmc_bus.c | 8 +++-
> drivers/bus/ifpga/ifpga_bus.c | 12 ++++--
> drivers/bus/pci/pci_common.c | 9 ++++-
> drivers/bus/vdev/vdev.c | 10 ++++-
> drivers/bus/vmbus/vmbus_common.c | 9 ++++-
> drivers/net/bnxt/bnxt_ethdev.c | 3 +-
> drivers/net/bonding/bonding_testpmd.c | 1 -
> drivers/net/mlx5/mlx5.c | 2 +-
> lib/ethdev/ethdev_driver.c | 13 +++++--
> lib/ethdev/ethdev_driver.h | 12 ++++++
> lib/ethdev/ethdev_pci.h | 2 +-
> lib/ethdev/rte_class_eth.c | 2 +-
> lib/ethdev/rte_ethdev.c | 4 +-
> lib/ethdev/rte_ethdev.h | 4 +-
> lib/ethdev/version.map | 1 +
> 19 files changed, 114 insertions(+), 44 deletions(-)
>
^ permalink raw reply [relevance 0%]
* Re: [PATCH] net/tap: allow more that 4 queues
2024-03-07 10:25 0% ` Ferruh Yigit
@ 2024-03-07 16:53 0% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-07 16:53 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Thu, 7 Mar 2024 10:25:48 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> > I got 4 queues setup, but looks like they are trash in secondary.
> > Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
> > This is better, but does take some ABI issue handling.
> >
>
> We can increase RTE_MP_MAX_FD_NUM but still there will be a limit.
>
> Can't it be possible to update 'rte_mp_sendmsg()' to support multiple
> 'rte_mp_sendmsg()' calls in this patch?
>
> Also need to check if fds size is less than 'RTE_PMD_TAP_MAX_QUEUES'
> with multiple 'rte_mp_sendmsg()' call support.
Kernel allows up to 253 fd's to be passed.
So for tap that would limit it to 126 queues; because TAP dups the
fd's for rx and tx but that could be fixable.
Tap should have a static assert about max queues and this as well.
Increasing RTE_MP_MAX_FD_NUM would also fix similar issues in af_xdp PMD
and when af_packet gets MP support.
^ permalink raw reply [relevance 0%]
* Re: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
@ 2024-03-07 13:41 0% ` fengchengwen
0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2024-03-07 13:41 UTC (permalink / raw)
To: Amit Prakash Shukla, Cheng Jiang, Gowrishankar Muthukrishnan
Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
Pavan Nikhilesh Bhagavatula
Hi Amit,
On 2024/3/1 18:59, Amit Prakash Shukla wrote:
> Hi Chengwen,
>
> Please find my reply in-line.
>
> Thanks,
> Amit Shukla
>
>> Hi Amit,
>>
>> On 2024/3/1 16:31, Amit Prakash Shukla wrote:
>>> Hi Chengwen,
>>>
>>> If I'm not wrong, your concern was about config file additions and not
>>> about the test as such. If the config file is getting complicated and
>>> there are better alternatives, we can minimize the config file changes
>>> with this patch and just provide minimum functionality as required and
>>> leave it open for future changes. For now, I can document the existing
>>> behavior in documentation as "Constraints". Similar approach is
>>> followed in other application such as ipsec-secgw
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__doc.dpdk.org_guid
>>> es_sample-5Fapp-5Fug_ipsec-5Fsecgw.html-
>> 23constraints&d=DwICaQ&c=nKjWe
>>>
>> c2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR69VnJLdSnADun7zLaXG1p5Rs7pXihE
>> &m=UXlZ
>>>
>> 1CWj8uotMMmYQ4e7wtBXj4geBwcMUirlqFw0pZzSlOIIAVjWaPgcaXtni370&
>> s=haaehrX
>>> QSEG6EFRW8w2sHKUTU75aJX7ML8vM-0mJsAI&e=
>>
>> Yes, I prefer enable different test just by modify configuration file, and then
>> limit the number of entries at the same time.
>>
>> This commit is bi-direction transfer, it is fixed, maybe later we should test 3/4
>> for mem2dev while 1/4 for dev2mem.
>
> Agreed. We will add this later after the base functionality is merged. I will send next version with constraints listed. Can I assume next version is good for merge?
I suggest do it all at once in [1].
[1] https://patches.dpdk.org/project/dpdk/cover/cover.1709210551.git.gmuthukrishn@marvell.com/
Thanks
>
>>
>> sometime we may need evaluate performance of one dma channel for
>> mem2mem, while another channel for mem2dev, we can't do this in current
>> implement (because vchan_dev is for all DMA channel).
>
> We are okay with extending it later. As you said, we are still deciding how the configuration file should look like.
>
>>
>> So I prefer restrict DMA non-mem2mem's config (include
>> dir/type/coreid/pfid/vfid/raddr) as the dma device's private configuration.
>>
>> Thanks
>>
>>>
>>> Constraints:
>>> 1. vchan_dev config will be same for all the configured DMA devices.
>>> 2. Alternate DMA device will do dev2mem and mem2dev implicitly.
>>> Example:
>>> xfer_mode=1
>>> vchan_dev=raddr=0x200000000,coreid=1,pfid=2,vfid=3
>>> lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3,
>>> lcore12@0000:00:04.4, lcore13@0000:00:04.5
>>>
>>> lcore10@0000:00:04.2, lcore12@0000:00:04.4 will do dev2mem and
>> lcore11@0000:00:04.3, lcore13@0000:00:04.5 will do mem2dev.
>>>
>>> Thanks,
>>> Amit Shukla
>>>
>>>> -----Original Message-----
>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>> Sent: Friday, March 1, 2024 7:16 AM
>>>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
>>>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>>>> <gmuthukrishn@marvell.com>
>>>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>>>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>>>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>>>> <pbhagavatula@marvell.com>
>>>> Subject: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support
>>>> bi- directional transfer
>>>>
>>>> Prioritize security for external emails: Confirm sender and content
>>>> safety before clicking links or opening attachments
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>> Hi Amit,
>>>>
>>>> I think this commit will complicated the test, plus futer we may add
>>>> more test (e.g. fill)
>>>>
>>>> I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",
>>>>
>>>> User could provide dma info by two way:
>>>> 1) lcore_dma=, which seperate each dma with ", ", but a maximum of a
>>>> certain number is allowed.
>>>> 2) lcore_dma0/1/2/..., each dma device take one line
>>>>
>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>>>> 2D1-2Dvipin.varghese-
>>>>
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>>>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>>>>
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>>>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e=
>>>>
>>>> Thanks
>>>>
>>>> On 2024/2/29 22:03, Amit Prakash Shukla wrote:
>>>>> Hi Chengwen,
>>>>>
>>>>> I liked your suggestion and tried making changes, but encountered
>>>>> parsing
>>>> issue for CFG files with line greater than CFG_VALUE_LEN=256(current
>>>> value set).
>>>>>
>>>>> There is a discussion on the similar lines in another patch set:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>>>> 2D1-2Dvipin.varghese-
>>>>
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>>>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>>>>
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>>>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e= .
>>>>>
>>>>> I believe this patch can be taken as-is and we can come up with the
>>>>> solution
>>>> when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in
>>>> this release is causing ABI breakage.
>>>>>
>>>>> Thanks,
>>>>> Amit Shukla
>>>>>
>
> <snip>
>
^ permalink raw reply [relevance 0%]
* Re: [PATCH] hash: make gfni stubs inline
@ 2024-03-07 10:32 0% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-07 10:32 UTC (permalink / raw)
To: Tyler Retzlaff
Cc: Stephen Hemminger, dev, Yipeng Wang, Sameh Gobriel,
Bruce Richardson, Vladimir Medvedkin
On Tue, Mar 5, 2024 at 6:53 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Tue, Mar 05, 2024 at 11:14:45AM +0100, David Marchand wrote:
> > On Mon, Mar 4, 2024 at 7:45 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > This reverts commit 07d836e5929d18ad6640ebae90dd2f81a2cafb71.
> > >
> > > Tyler found build issues with MSVC and the thash gfni stubs.
> > > The problem would be link errors from missing symbols.
> >
> > Trying to understand this link error.
> > Does it come from the fact that rte_thash_gfni/rte_thash_gfni_bulk
> > declarations are hidden under RTE_THASH_GFNI_DEFINED in
> > rte_thash_gfni.h?
> >
> > If so, why not always expose those two symbols unconditionnally and
> > link with the stub only when ! RTE_THASH_GFNI_DEFINED.
>
> So I don't have a lot of background of this lib.
>
> I think we understand that we can't conditionally expose symbols. That's
> what windows was picking up because it seems none of our CI's ever end
> up with RTE_THASH_GFNI_DEFINED but my local test system did and failed.
> (my experiments showed that Linux would complain too if it was defined)
I can't reproduce a problem when I build (gcc/clang) for a target that
has GFNI/AVX512F.
binutils ld seems to just ignore unknown symbols in the map.
With current main:
[dmarchan@dmarchan main]$ nm build/lib/librte_hash.so.24.1 | grep rte_thash_gfni
00000000000088b0 T rte_thash_gfni_supported
[dmarchan@dmarchan main]$ nm build-nogfni/lib/librte_hash.so.24.1 |
grep rte_thash_gfni
00000000000102c0 T rte_thash_gfni
00000000000102d0 T rte_thash_gfni_bulk
000000000000294e t rte_thash_gfni_bulk.cold
0000000000002918 t rte_thash_gfni.cold
000000000000d3c0 T rte_thash_gfni_supported
>
> If we always expose the symbols then as you point out we have to
> conditionally link with the stub otherwise the inline (non-stub) will be
> duplicate and build / link will fail.
>
> I guess the part I don't understand with your suggestion is how we would
> conditionally link with just the stub? We have to link with rte_hash to
> get the rest of hash and the stub. I've probably missed something here.
No we can't, Stephen suggestion is a full solution.
>
> Since we never had a release exposing the new symbols introduced by
> Stephen in question my suggestion was that we just revert for 24.03 so
> we don't end up with an ABI break later if we choose to solve the
> problem without exports.
>
> I don't know what else to do, but I think we need to decide for 24.03.
I am fully aware that we must fix this for 24.03.
I would like to be sure Stephen fix (see v3) works for you, so have a
look because I am not able to reproduce an issue and validate the fix
myself.
--
David Marchand
^ permalink raw reply [relevance 0%]
* Re: [PATCH] net/tap: allow more that 4 queues
2024-03-06 20:21 3% ` Stephen Hemminger
@ 2024-03-07 10:25 0% ` Ferruh Yigit
2024-03-07 16:53 0% ` Stephen Hemminger
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-03-07 10:25 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 3/6/2024 8:21 PM, Stephen Hemminger wrote:
> On Wed, 6 Mar 2024 16:14:51 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
>> On 2/29/2024 5:56 PM, Stephen Hemminger wrote:
>>> The tap device needs to exchange file descriptors for tx and rx.
>>> But the EAL MP layer has limit of 8 file descriptors per message.
>>> The ideal resolution would be to increase the number of file
>>> descriptors allowed for rte_mp_sendmsg(), but this would break
>>> the ABI. Workaround the constraint by breaking into multiple messages.
>>>
>>> Do not hide errors about MP message failures.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>> drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
>>> 1 file changed, 33 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
>>> index 69d9da695bed..df18c328f498 100644
>>> --- a/drivers/net/tap/rte_eth_tap.c
>>> +++ b/drivers/net/tap/rte_eth_tap.c
>>> @@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
>>> msg.fds[fd_iterator++] = process_private->txq_fds[i];
>>> msg.num_fds++;
>>> request_param->txq_count++;
>>> +
>>> + /* Need to break request into chunks */
>>> + if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
>>> + err = rte_mp_sendmsg(&msg);
>>> + if (err < 0)
>>> + goto fail;
>>> +
>>> + fd_iterator = 0;
>>> + msg.num_fds = 0;
>>> + request_param->txq_count = 0;
>>> + }
>>> }
>>> for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>> msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>>> msg.num_fds++;
>>> request_param->rxq_count++;
>>> +
>>> + if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
>>> + err = rte_mp_sendmsg(&msg);
>>> + if (err < 0)
>>> + goto fail;
>>> +
>>> + fd_iterator = 0;
>>> + msg.num_fds = 0;
>>> + request_param->rxq_count = 0;
>>> + }
>>> }
>>
>> Hi Stephen,
>>
>> Did you able to verify with more than 4 queues?
>>
>> As far as I can see, in the secondary counterpart of the
>> 'rte_mp_sendmsg()', each time secondary index starts from 0, and
>> subsequent calls overwrites the fds in secondary.
>> So practically still only 4 queues works.
>
> I got 4 queues setup, but looks like they are trash in secondary.
> Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
> This is better, but does take some ABI issue handling.
>
We can increase RTE_MP_MAX_FD_NUM but still there will be a limit.
Can't it be possible to update 'rte_mp_sendmsg()' to support multiple
'rte_mp_sendmsg()' calls in this patch?
Also need to check if fds size is less than 'RTE_PMD_TAP_MAX_QUEUES'
with multiple 'rte_mp_sendmsg()' call support.
^ permalink raw reply [relevance 0%]
* [PATCH v5 1/7] ethdev: support report register names and filter
@ 2024-03-07 3:02 8% ` Jie Hai
0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-03-07 3:02 UTC (permalink / raw)
To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Cc: lihuisong, fengchengwen, haijie1
This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their names.
The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by names. And the original API
rte_eth_dev_get_reg_info() does not use the names and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "offset_XXX".
Signed-off-by: Jie Hai <haijie1@huawei.com>
---
doc/guides/rel_notes/release_24_03.rst | 9 ++++++
lib/ethdev/rte_dev_info.h | 11 ++++++++
lib/ethdev/rte_ethdev.c | 38 ++++++++++++++++++++++++++
lib/ethdev/rte_ethdev.h | 29 ++++++++++++++++++++
lib/ethdev/version.map | 1 +
5 files changed, 88 insertions(+)
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 78590c047b2e..e491579ca984 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -161,6 +161,12 @@ New Features
* Added power-saving during polling within the ``rte_event_dequeue_burst()`` API.
* Added support for DMA adapter.
+* **Added support for dumping registers with names and filter.**
+
+ * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
+ the registers by their names and get the information of registers(names,
+ values and other attributes).
+
Removed Items
-------------
@@ -228,6 +234,9 @@ ABI Changes
* No ABI change that would break compatibility with 23.11.
+* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+ structure for reporting names of registers and filtering them by names.
+
Known Issues
------------
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..0badb92432ae 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
#include <stdint.h>
+#define RTE_ETH_REG_NAME_SIZE 64
+struct rte_eth_reg_name {
+ char name[RTE_ETH_REG_NAME_SIZE];
+};
+
/*
* Placeholder for accessing device registers
*/
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
uint32_t length; /**< Number of registers to fetch */
uint32_t width; /**< Size of device register */
uint32_t version; /**< Device version */
+ /**
+ * Filter for target subset of registers.
+ * This field could affects register selection for data/length/names.
+ */
+ const char *filter;
+ struct rte_eth_reg_name *names; /**< Registers name saver */
};
/*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..82d228790692 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
int
rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+ struct rte_dev_reg_info reg_info = { 0 };
+ int ret;
+
+ if (info == NULL) {
+ RTE_ETHDEV_LOG_LINE(ERR,
+ "Cannot get ethdev port %u register info to NULL",
+ port_id);
+ return -EINVAL;
+ }
+
+ reg_info.length = info->length;
+ reg_info.data = info->data;
+
+ ret = rte_eth_dev_get_reg_info_ext(port_id, ®_info);
+ if (ret != 0)
+ return ret;
+
+ info->length = reg_info.length;
+ info->width = reg_info.width;
+ info->version = reg_info.version;
+ info->offset = reg_info.offset;
+
+ return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
{
struct rte_eth_dev *dev;
+ uint32_t i;
int ret;
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6402,12 +6431,21 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
return -EINVAL;
}
+ if (info->names != NULL && info->length != 0)
+ memset(info->names, 0,
+ sizeof(struct rte_eth_reg_name) * info->length);
+
if (*dev->dev_ops->get_reg == NULL)
return -ENOTSUP;
ret = eth_err(port_id, (*dev->dev_ops->get_reg)(dev, info));
rte_ethdev_trace_get_reg_info(port_id, info, ret);
+ /* Report the default names if drivers not report. */
+ if (info->names != NULL && strlen(info->names[0].name) == 0)
+ for (i = 0; i < info->length; i++)
+ snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+ "offset_%u", info->offset + i * info->width);
return ret;
}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ed27360447a3..cd95a0d51038 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5066,6 +5066,35 @@ __rte_experimental
int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
struct rte_power_monitor_cond *pmc);
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param info
+ * Pointer to rte_dev_reg_info structure to fill in.
+ * - If info->filter is NULL, return info for all registers (seen as filter
+ * none).
+ * - If info->filter is not NULL, return error if the driver does not support
+ * names or filter.
+ * - If info->data is NULL, the function fills in the width and length fields.
+ * - If info->data is not NULL, ethdev considers there are enough spaces to
+ * store the registers, and the values of registers whose name contains the
+ * filter string are put into the buffer pointed at by info->data.
+ * - If info->names is not NULL, drivers should fill it or the ethdev fills it
+ * with default names.
+ * @return
+ * - (0) if successful.
+ * - (-ENOTSUP) if hardware doesn't support.
+ * - (-EINVAL) if bad parameter.
+ * - (-ENODEV) if *port_id* invalid.
+ * - (-EIO) if device is removed.
+ * - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
/**
* Retrieve device registers and register attributes (number of registers and
* register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b5c..e5ec2a2a9741 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
# added in 24.03
__rte_eth_trace_tx_queue_count;
+ rte_eth_dev_get_reg_info_ext;
rte_eth_find_rss_algo;
rte_flow_async_update_resized;
rte_flow_calc_encap_hash;
--
2.30.0
^ permalink raw reply [relevance 8%]
* Re: [PATCH] net/tap: allow more that 4 queues
@ 2024-03-06 20:21 3% ` Stephen Hemminger
2024-03-07 10:25 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-06 20:21 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev
On Wed, 6 Mar 2024 16:14:51 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> On 2/29/2024 5:56 PM, Stephen Hemminger wrote:
> > The tap device needs to exchange file descriptors for tx and rx.
> > But the EAL MP layer has limit of 8 file descriptors per message.
> > The ideal resolution would be to increase the number of file
> > descriptors allowed for rte_mp_sendmsg(), but this would break
> > the ABI. Workaround the constraint by breaking into multiple messages.
> >
> > Do not hide errors about MP message failures.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
> > 1 file changed, 33 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> > index 69d9da695bed..df18c328f498 100644
> > --- a/drivers/net/tap/rte_eth_tap.c
> > +++ b/drivers/net/tap/rte_eth_tap.c
> > @@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
> > msg.fds[fd_iterator++] = process_private->txq_fds[i];
> > msg.num_fds++;
> > request_param->txq_count++;
> > +
> > + /* Need to break request into chunks */
> > + if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> > + err = rte_mp_sendmsg(&msg);
> > + if (err < 0)
> > + goto fail;
> > +
> > + fd_iterator = 0;
> > + msg.num_fds = 0;
> > + request_param->txq_count = 0;
> > + }
> > }
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> > msg.num_fds++;
> > request_param->rxq_count++;
> > +
> > + if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> > + err = rte_mp_sendmsg(&msg);
> > + if (err < 0)
> > + goto fail;
> > +
> > + fd_iterator = 0;
> > + msg.num_fds = 0;
> > + request_param->rxq_count = 0;
> > + }
> > }
>
> Hi Stephen,
>
> Did you able to verify with more than 4 queues?
>
> As far as I can see, in the secondary counterpart of the
> 'rte_mp_sendmsg()', each time secondary index starts from 0, and
> subsequent calls overwrites the fds in secondary.
> So practically still only 4 queues works.
I got 4 queues setup, but looks like they are trash in secondary.
Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
This is better, but does take some ABI issue handling.
^ permalink raw reply [relevance 3%]
* [PATCH v5 4/6] pipeline: replace zero length array with flex array
@ 2024-03-06 20:13 4% ` Tyler Retzlaff
0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-06 20:13 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
Tyler Retzlaff
Zero length arrays are GNU extension. Replace with
standard flex array.
Add a temporary suppression for rte_pipeline_table_entry
libabigail bug:
Bugzilla ID: https://sourceware.org/bugzilla/show_bug.cgi?id=31377
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/libabigail.abignore | 2 ++
lib/pipeline/rte_pipeline.h | 2 +-
lib/pipeline/rte_port_in_action.c | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 25c73a5..5292b63 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,8 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Temporary exceptions till next major ABI version ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+ name = rte_pipeline_table_entry
[suppress_type]
name = rte_rcu_qsbr
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index ec51b9b..0c7994b 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -220,7 +220,7 @@ struct rte_pipeline_table_entry {
uint32_t table_id;
};
/** Start of table entry area for user defined actions and meta-data */
- __extension__ uint8_t action_data[0];
+ uint8_t action_data[];
};
/**
diff --git a/lib/pipeline/rte_port_in_action.c b/lib/pipeline/rte_port_in_action.c
index bbacaff..4127bd2 100644
--- a/lib/pipeline/rte_port_in_action.c
+++ b/lib/pipeline/rte_port_in_action.c
@@ -283,7 +283,7 @@ struct rte_port_in_action_profile *
struct rte_port_in_action {
struct ap_config cfg;
struct ap_data data;
- alignas(RTE_CACHE_LINE_SIZE) uint8_t memory[0];
+ alignas(RTE_CACHE_LINE_SIZE) uint8_t memory[];
};
static __rte_always_inline void *
--
1.8.3.1
^ permalink raw reply [relevance 4%]
Results 401-600 of ~18000 next (older) | prev (newer) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2023-07-03 9:31 [PATCH v4] bitmap: add scan from offset function Volodymyr Fialko
2023-07-03 12:39 ` [PATCH v5] " Volodymyr Fialko
2024-07-03 12:50 ` Thomas Monjalon
2024-07-03 13:42 0% ` Volodymyr Fialko
2023-10-20 16:51 [PATCH v2 0/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-04-30 16:27 ` [PATCH v9 " Yoan Picchi
2024-04-30 16:27 ` [PATCH v9 4/4] " Yoan Picchi
2024-06-14 13:42 4% ` David Marchand
2024-07-03 17:13 ` [PATCH v10 0/4] " Yoan Picchi
2024-07-03 17:13 ` [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-07-04 20:31 3% ` David Marchand
2024-07-05 17:45 3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
2024-07-05 17:45 3% ` [PATCH v11 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-08 12:14 3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
2024-07-08 12:14 3% ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-09 4:48 0% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
2023-11-15 13:36 [PATCH v1 0/2] dts: api docs generation Juraj Linkeš
2024-04-12 10:14 ` [PATCH v4 0/3] dts: API " Juraj Linkeš
2024-04-12 10:14 2% ` [PATCH v4 3/3] dts: add API doc generation Juraj Linkeš
2024-06-24 13:26 ` [PATCH v5 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 13:26 2% ` [PATCH v5 4/4] dts: add API doc generation Juraj Linkeš
2024-06-24 13:45 ` [PATCH v6 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 13:46 2% ` [PATCH v6 4/4] dts: add API doc generation Juraj Linkeš
2024-06-24 14:08 0% ` Juraj Linkeš
2024-06-24 14:25 ` [PATCH v7 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 14:25 2% ` [PATCH v7 4/4] dts: add API doc generation Juraj Linkeš
2024-07-12 8:57 ` [PATCH v8 0/5] dts: API docs generation Juraj Linkeš
2024-07-12 8:57 3% ` [PATCH v8 5/5] dts: add API doc generation Juraj Linkeš
2024-07-30 13:51 0% ` Thomas Monjalon
2023-12-12 15:36 [PATCH v1] crypto/ipsec_mb: unified IPsec MB interface Brian Dooley
2024-03-12 13:50 ` [PATCH v6 1/5] ci: replace IPsec-mb package install Brian Dooley
2024-03-12 13:54 ` David Marchand
2024-03-12 15:26 3% ` Power, Ciara
2024-03-12 16:13 3% ` David Marchand
2024-03-12 17:07 0% ` Power, Ciara
2023-12-14 1:56 [PATCH] ethdev: add dump regs for telemetry Jie Hai
2024-03-07 3:02 ` [PATCH v5 0/7] support dump reigser names and filter them Jie Hai
2024-03-07 3:02 8% ` [PATCH v5 1/7] ethdev: support report register names and filter Jie Hai
2024-07-22 6:58 ` [PATCH v6 0/8] support dump reigser " Jie Hai
2024-07-22 6:58 8% ` [PATCH v6 1/8] ethdev: support report register " Jie Hai
2024-01-08 1:59 Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
2024-01-08 10:41 ` Morten Brørup
2024-04-03 11:43 0% ` Ferruh Yigit
2024-01-08 15:13 ` Konstantin Ananyev
2024-04-03 0:14 4% ` Stephen Hemminger
2024-04-03 11:42 0% ` Ferruh Yigit
2024-01-08 8:27 [PATCH] app/dma-perf: support bi-directional transfer Amit Prakash Shukla
2024-02-27 19:26 ` [PATCH v2] " Amit Prakash Shukla
2024-02-28 7:03 ` fengchengwen
2024-02-28 9:38 ` [EXT] " Amit Prakash Shukla
2024-02-29 14:03 ` Amit Prakash Shukla
2024-03-01 1:46 ` fengchengwen
2024-03-01 8:31 ` [EXTERNAL] " Amit Prakash Shukla
2024-03-01 9:30 ` fengchengwen
2024-03-01 10:59 ` Amit Prakash Shukla
2024-03-07 13:41 0% ` fengchengwen
2024-01-24 22:17 [PATCH 0/2] more replacement of zero length array Tyler Retzlaff
2024-03-06 20:13 ` [PATCH v5 0/6] " Tyler Retzlaff
2024-03-06 20:13 4% ` [PATCH v5 4/6] pipeline: replace zero length array with flex array Tyler Retzlaff
2024-01-30 3:46 [RFC 0/2] net/tap RSS BPF rewrite Stephen Hemminger
2024-04-02 17:12 ` [PATCH v5 0/8] net/tap: cleanups and fix BPF flow Stephen Hemminger
2024-04-02 17:12 2% ` [PATCH v5 6/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-05 21:14 ` [PATCH v6 0/8] net/tap: cleanup and fix BPF flow support Stephen Hemminger
2024-04-05 21:14 2% ` [PATCH v6 6/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-08 21:18 ` [PATCH v7 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-08 21:18 2% ` [PATCH v7 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-09 3:40 ` [PATCH v8 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-09 3:40 2% ` [PATCH v8 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-26 15:48 ` [PATCH v9 0/9] net/tap: fix RSS (BPF) support Stephen Hemminger
2024-04-26 15:48 2% ` [PATCH v9 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-01 16:11 ` [PATCH v10 0/9] net/tap: fix RSS (BPF) flow support Stephen Hemminger
2024-05-01 16:12 2% ` [PATCH v10 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 2:49 ` [PATCH v11 0/9] net/tap fix RSS (BPF) flow support Stephen Hemminger
2024-05-02 2:49 2% ` [PATCH v11 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 21:31 ` [PATCH v12 00/12] net/tap: RSS and other fixes Stephen Hemminger
2024-05-02 21:31 ` [PATCH v12 02/12] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-20 17:46 ` Ferruh Yigit
2024-05-20 18:16 3% ` Stephen Hemminger
2024-05-02 21:31 2% ` [PATCH v12 06/12] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 21:31 ` [PATCH v12 07/12] net/tap: use libbpf to load new " Stephen Hemminger
2024-05-20 17:49 ` Ferruh Yigit
2024-05-20 18:18 ` Stephen Hemminger
2024-05-20 21:42 3% ` Luca Boccassi
2024-05-20 22:08 0% ` Ferruh Yigit
2024-05-20 22:25 0% ` Luca Boccassi
2024-05-20 23:20 0% ` Stephen Hemminger
2024-05-21 2:47 ` [PATCH v13 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21 2:47 2% ` [PATCH v13 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21 2:47 2% ` [PATCH v13 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-21 17:06 ` [PATCH v14 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21 17:06 2% ` [PATCH v14 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21 17:06 2% ` [PATCH v14 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-21 20:12 ` [PATCH v15 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21 20:12 2% ` [PATCH v15 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21 20:12 2% ` [PATCH v15 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-01-30 23:26 [PATCH] replace GCC marker extension with C11 anonymous unions Tyler Retzlaff
2024-02-27 5:41 ` [PATCH v6 00/23] stop and remove RTE_MARKER typedefs Tyler Retzlaff
2024-02-27 5:41 ` [PATCH v6 02/23] mbuf: consolidate driver asserts for mbuf struct Tyler Retzlaff
2024-03-14 16:51 4% ` Tyler Retzlaff
2024-04-02 20:08 3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-02 20:08 2% ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-02 20:45 0% ` Stephen Hemminger
2024-04-02 20:51 0% ` Tyler Retzlaff
2024-04-03 17:53 3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-03 17:53 2% ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-03 19:32 0% ` Morten Brørup
2024-04-03 22:45 0% ` Tyler Retzlaff
2024-04-03 21:49 0% ` Stephen Hemminger
2024-04-04 17:51 3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-04 17:51 2% ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-06-19 15:01 3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
2024-06-19 15:01 6% ` [PATCH v12 2/4] mbuf: remove marker fields David Marchand
2024-02-02 5:13 [PATCH 1/1] eal: add C++ include guard in generic/rte_vect.h Ashish Sadanandan
2024-02-02 9:18 ` Thomas Monjalon
2024-02-02 9:40 ` Bruce Richardson
2024-02-02 20:58 ` Ashish Sadanandan
2024-03-13 23:45 ` Stephen Hemminger
2024-03-14 3:45 3% ` Tyler Retzlaff
2024-02-29 17:56 [PATCH] net/tap: allow more that 4 queues Stephen Hemminger
2024-03-06 16:14 ` Ferruh Yigit
2024-03-06 20:21 3% ` Stephen Hemminger
2024-03-07 10:25 0% ` Ferruh Yigit
2024-03-07 16:53 0% ` Stephen Hemminger
2024-03-04 18:45 [PATCH] hash: make gfni stubs inline Stephen Hemminger
2024-03-05 10:14 ` David Marchand
2024-03-05 17:53 ` Tyler Retzlaff
2024-03-07 10:32 0% ` David Marchand
2024-03-06 12:24 [PATCH v3 00/33] net/ena: v2.9.0 driver release shaibran
2024-03-06 12:24 ` [PATCH v3 07/33] net/ena: restructure the llq policy setting process shaibran
2024-03-08 17:24 3% ` Ferruh Yigit
2024-03-10 14:29 0% ` Brandes, Shai
2024-03-13 11:21 0% ` Ferruh Yigit
[not found] <20220825024425.10534-1-lihuisong@huawei.com>
2024-01-30 6:36 ` [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port Huisong Li
2024-03-08 10:38 0% ` lihuisong (C)
2024-04-23 11:17 0% ` lihuisong (C)
2024-03-08 18:54 2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
2024-03-08 20:36 8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
2024-03-14 15:23 0% ` Stephen Hemminger
2024-03-14 15:38 3% ` David Marchand
2024-03-09 18:12 2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
2024-03-14 14:40 4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
2024-03-11 22:36 2% Community CI Meeting Minutes - March 7, 2024 Patrick Robb
2024-03-12 7:52 [PATCH 0/3] support setting lanes Dengdui Huang
2024-03-12 7:52 5% ` [PATCH 1/3] ethdev: " Dengdui Huang
2024-03-12 7:52 [PATCH 3/3] app/testpmd: " Dengdui Huang
2024-03-22 7:09 ` [PATCH v2 0/6] " Dengdui Huang
2024-03-22 7:09 5% ` [PATCH v2 1/6] ethdev: " Dengdui Huang
2024-03-20 10:55 [PATCH 0/2] introduce PM QoS interface Huisong Li
2024-06-13 11:20 4% ` [PATCH v2 0/2] power: " Huisong Li
2024-06-13 11:20 5% ` [PATCH v2 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19 6:31 4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
2024-06-19 6:31 5% ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19 15:32 0% ` Thomas Monjalon
2024-06-20 2:32 0% ` lihuisong (C)
2024-06-19 6:59 0% ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
2024-06-27 6:00 4% ` [PATCH v4 " Huisong Li
2024-06-27 6:00 5% ` [PATCH v4 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-02 3:50 4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
2024-07-02 3:50 5% ` [PATCH v5 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 2:29 4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09 2:29 5% ` [PATCH v6 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 6:31 4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09 6:31 5% ` [PATCH v7 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09 7:25 4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09 7:25 5% ` [PATCH v8 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-03-20 20:50 [PATCH 00/46] use stdatomic API Tyler Retzlaff
2024-03-20 20:51 ` [PATCH 15/46] net/sfc: use rte " Tyler Retzlaff
2024-03-21 18:11 3% ` Aaron Conole
2024-03-21 18:15 0% ` Tyler Retzlaff
2024-03-20 21:05 [PATCH 00/15] fix packing of structs when building with MSVC Tyler Retzlaff
2024-03-20 21:05 ` [PATCH 02/15] eal: pack structures " Tyler Retzlaff
2024-03-21 16:02 3% ` Bruce Richardson
2024-03-21 14:49 3% DPDK Release Status Meeting 2024-03-21 Mcnamara, John
2024-03-25 10:05 [PATCH v3] graph: expose node context as pointers Robin Jarry
2024-03-27 9:14 4% ` [PATCH v5] " Robin Jarry
2024-05-29 17:54 0% ` Nithin Dabilpuram
2024-06-18 12:33 4% ` David Marchand
2024-06-25 15:22 0% ` Robin Jarry
2024-06-26 11:30 0% ` Jerin Jacob
2024-03-26 23:59 [PATCH] igc/ixgbe: add get/set link settings interface Marek Pazdan
2024-04-03 13:40 ` [PATCH] lib: " Marek Pazdan
2024-04-03 16:49 3% ` Tyler Retzlaff
2024-04-04 7:09 4% ` David Marchand
2024-04-05 0:55 0% ` Tyler Retzlaff
2024-04-05 0:56 0% ` Tyler Retzlaff
2024-04-05 8:58 0% ` David Marchand
2024-03-28 2:19 3% Minutes of Technical Board meeting 20-March-2024 Stephen Hemminger
2024-03-28 21:46 3% DPDK 24.03 released Thomas Monjalon
2024-03-28 23:33 [PATCH] vhost: optimize mbuf allocation in virtio Tx packed path Andrey Ignatov
2024-03-28 23:44 ` Stephen Hemminger
2024-03-29 0:10 ` Andrey Ignatov
2024-03-29 2:53 ` Stephen Hemminger
2024-03-29 13:04 ` Maxime Coquelin
2024-03-29 13:42 3% ` The effect of inlining Morten Brørup
2024-03-29 20:26 0% ` Tyler Retzlaff
2024-04-01 15:20 3% ` Mattias Rönnblom
2024-04-03 16:01 3% ` Morten Brørup
2024-03-30 17:54 18% [PATCH] version: 24.07-rc0 David Marchand
2024-04-02 8:52 0% ` Thomas Monjalon
2024-04-02 9:25 0% ` David Marchand
2024-04-04 16:29 3% Community CI Meeting Minutes - April 4, 2024 Patrick Robb
2024-04-04 21:04 [PATCH v1 0/3] Additional queue stats Nicolas Chautru
2024-04-04 21:04 ` [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
2024-04-05 0:46 3% ` Stephen Hemminger
2024-04-05 15:15 3% ` Stephen Hemminger
2024-04-05 18:17 3% ` Chautru, Nicolas
2024-04-10 9:33 Strict aliasing problem with rte_eth_linkstatus_set() fengchengwen
2024-04-10 15:27 ` Stephen Hemminger
2024-04-10 15:58 3% ` Ferruh Yigit
2024-04-10 17:54 ` Morten Brørup
2024-04-10 19:58 3% ` Tyler Retzlaff
2024-04-11 3:20 0% ` fengchengwen
2024-04-11 8:22 3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
2024-04-12 11:57 3% ` [PATCH v2 " Akhil Goyal
2024-05-29 10:43 0% ` Anoob Joseph
2024-05-30 9:19 0% ` Akhil Goyal
2024-04-17 7:23 [RFC 0/2] ethdev: update GENEVE option item structure Michael Baum
2024-06-11 17:07 4% ` Ferruh Yigit
2024-04-18 17:49 3% Community CI Meeting Minutes - April 18, 2024 Patrick Robb
2024-04-24 4:08 3% getting rid of type argument to rte_malloc() Stephen Hemminger
2024-04-24 10:29 0% ` Ferruh Yigit
2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 16:23 0% ` Stephen Hemminger
2024-04-24 17:09 0% ` Morten Brørup
2024-04-24 19:05 0% ` Stephen Hemminger
2024-04-24 19:06 0% ` Stephen Hemminger
2024-04-24 15:24 3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
2024-04-24 17:25 3% ` Morten Brørup
2024-04-24 19:10 0% ` Thomas Monjalon
2024-04-25 17:46 [RFC] net/af_packet: make stats reset reliable Ferruh Yigit
2024-04-26 14:38 ` [RFC v2] " Ferruh Yigit
2024-04-28 15:11 ` Mattias Rönnblom
2024-05-07 7:23 ` Mattias Rönnblom
2024-05-07 13:49 ` Ferruh Yigit
2024-05-07 14:51 ` Stephen Hemminger
2024-05-07 16:00 3% ` Morten Brørup
2024-05-07 16:54 0% ` Ferruh Yigit
2024-05-07 18:47 0% ` Stephen Hemminger
2024-05-08 7:48 0% ` Mattias Rönnblom
2024-04-30 15:39 [PATCH] net/af_packet: fix statistics Stephen Hemminger
2024-05-01 16:25 ` Ferruh Yigit
2024-05-01 16:44 3% ` Stephen Hemminger
2024-05-01 18:18 0% ` Morten Brørup
2024-05-02 13:47 0% ` Ferruh Yigit
2024-05-02 13:55 [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD Tom Jones
2024-05-03 9:46 ` Tom Jones
2024-05-03 13:03 ` Bruce Richardson
2024-05-03 13:12 ` Tom Jones
2024-05-03 13:24 3% ` Bruce Richardson
2024-05-12 5:55 [PATCH] bpf: don't verify classic bpfs Yoav Winstein
2024-05-12 16:03 ` Stephen Hemminger
2024-05-16 9:36 ` Konstantin Ananyev
2024-06-27 15:36 3% ` Thomas Monjalon
2024-06-27 18:14 0% ` Konstantin Ananyev
2024-05-13 15:59 [PATCH 0/9] reowrd in prog guide Nandini Persad
2024-05-13 15:59 6% ` [PATCH 1/9] doc: reword design section in contributors guidelines Nandini Persad
2024-06-21 2:32 ` [PATCH v2 1/9] doc: reword pmd section in prog guide Nandini Persad
2024-06-21 2:32 6% ` [PATCH v2 3/9] doc: reword design section in contributors guidelines Nandini Persad
2024-05-29 23:33 [PATCH v10 00/20] Remove use of noninclusive term sanity Stephen Hemminger
2024-05-29 23:33 2% ` [PATCH v10 01/20] mbuf: replace term sanity check Stephen Hemminger
2024-05-30 12:44 [PATCH v4 1/2] eventdev/dma: reorganize event DMA ops pbhagavatula
2024-06-07 10:36 3% ` [PATCH v5 " pbhagavatula
2024-06-08 6:16 9% ` Jerin Jacob
2024-06-04 4:44 [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-06-04 4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-06-04 15:41 3% ` Stephen Hemminger
2024-06-06 13:32 [PATCH 0/1] net/ena: devargs api change shaibran
2024-06-06 13:33 3% ` [PATCH 1/1] net/ena: restructure the llq policy user setting shaibran
2024-07-05 17:32 4% ` Ferruh Yigit
2024-07-06 4:59 4% ` Brandes, Shai
2024-06-07 13:25 [PATCH v7 2/3] ethdev: add VXLAN last reserved field Thomas Monjalon
2024-06-07 14:02 ` [PATCH v8 0/3] support VXLAN last reserved byte modification Rongwei Liu
2024-06-07 14:02 ` [PATCH v8 2/3] ethdev: add VXLAN last reserved field Rongwei Liu
2024-06-11 14:52 3% ` Ferruh Yigit
2024-06-12 1:25 0% ` rongwei liu
2024-06-25 14:46 0% ` Thomas Monjalon
2024-06-20 17:57 [PATCH v4 01/13] net/i40e: add missing vector API header include Mattias Rönnblom
2024-07-24 7:53 ` [PATCH v5 0/6] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-07-24 7:53 5% ` [PATCH v5 5/6] ci: test " Mattias Rönnblom
2024-06-21 18:33 3% [DPDK/core Bug 1471] rte_pktmbuf_free_bulk does not respect RTE_LIBRTE_MBUF_DEBUG bugzilla
2024-06-24 11:04 [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev Vladimir Ratnikov
2024-06-24 15:13 3% ` Stephen Hemminger
2024-06-25 12:01 3% ` David Marchand
2024-06-27 20:52 2% Community CI Meeting Minutes - June 27, 2024 Patrick Robb
2024-07-05 14:52 4% [PATCH v6] graph: expose node context as pointers Robin Jarry
2024-07-12 11:39 0% ` [EXTERNAL] " Kiran Kumar Kokkilagadda
2024-07-07 9:57 4% [PATCH] net/mlx5: fix compilation warning in GCC-9.1 Gregory Etelson
2024-07-18 7:24 4% ` Raslan Darawsheh
2024-07-08 20:35 [PATCH v3] ethdev: Add link_speed lanes support Damodharam Ammepalli
2024-07-08 23:22 ` [PATCH v4] " Damodharam Ammepalli
2024-07-09 11:10 4% ` Ferruh Yigit
2024-07-09 21:20 0% ` Damodharam Ammepalli
2024-07-12 18:25 release candidate 24.07-rc2 David Marchand
2024-07-23 2:14 4% ` Xu, HailinX
2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
2024-07-17 2:27 3% ` Stephen Hemminger
2024-07-18 18:48 0% ` Wathsala Wathawana Vithanage
2024-07-20 3:05 3% ` Honnappa Nagarahalli
2024-07-18 3:42 10% [DPDK/eventdev Bug 1497] [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing bugzilla
2024-07-18 15:03 IPv6 APIs rework Robin Jarry
2024-07-18 20:27 ` Morten Brørup
2024-07-18 21:25 ` Vladimir Medvedkin
2024-07-18 21:34 3% ` Robin Jarry
2024-07-19 8:25 0% ` Konstantin Ananyev
2024-07-19 9:12 0% ` Morten Brørup
2024-07-19 10:41 0% ` Medvedkin, Vladimir
2024-07-22 14:53 8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
2024-07-24 5:07 0% ` Anoob Joseph
2024-07-24 6:46 0% ` [EXTERNAL] " Akhil Goyal
2024-07-25 15:01 0% ` Kusztal, ArkadiuszX
2024-07-31 12:57 3% ` Thomas Monjalon
2024-07-22 14:55 5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
2024-07-24 6:49 0% ` [EXTERNAL] " Akhil Goyal
2024-07-25 9:48 0% ` Kusztal, ArkadiuszX
2024-07-25 15:53 0% ` Gowrishankar Muthukrishnan
2024-07-30 14:39 0% ` Gowrishankar Muthukrishnan
2024-07-31 12:51 0% ` Thomas Monjalon
2024-07-25 16:00 0% ` Gowrishankar Muthukrishnan
2024-07-22 14:56 8% [PATCH] doc: announce vhost changes to support asymmetric operation Gowrishankar Muthukrishnan
2024-07-23 18:30 4% ` Jerin Jacob
2024-07-25 9:29 4% ` [EXTERNAL] " Gowrishankar Muthukrishnan
2024-07-29 12:49 [PATCH] doc: announce dmadev new capability addition Vamsi Attunuru
2024-07-29 15:20 3% ` Jerin Jacob
2024-07-29 17:17 0% ` Morten Brørup
2024-07-31 10:24 0% ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).