ABI - search results

DPDK patches and discussions
 help / color / mirror / Atom feed

Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download:

* [PATCH v2 0/2] ABI check updates
  @ 2023-03-23 17:15  9% ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-23 17:15 UTC (permalink / raw)
  To: dev

This series moves ABI exceptions in a single configuration file and
simplifies the ABI check so that no artefact depending on libabigail
version is stored in the CI.

-- 
David Marchand

Changes since v1:
- rebased after abi check parallelisation rework,


David Marchand (2):
  devtools: unify configuration for ABI check
  devtools: stop depending on libabigail xml format

 .ci/linux-build.sh            |  4 ----
 .github/workflows/build.yml   |  2 +-
 MAINTAINERS                   |  1 -
 devtools/check-abi.sh         | 24 +++++++++++++++---------
 devtools/gen-abi.sh           | 27 ---------------------------
 devtools/libabigail.abignore  | 12 +++++++++---
 devtools/test-meson-builds.sh |  5 -----
 7 files changed, 25 insertions(+), 50 deletions(-)
 delete mode 100755 devtools/gen-abi.sh

-- 
2.39.2


^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [RFC] ethdev: improve link speed to string
  2023-02-10 14:41  3%               ` Ferruh Yigit
@ 2023-03-23 14:40  3%                 ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-03-23 14:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Min Hu (Connor), Andrew Rybchenko, thomas, dev

On 2/10/2023 2:41 PM, Ferruh Yigit wrote:
> On 1/19/2023 4:45 PM, Stephen Hemminger wrote:
>> On Thu, 19 Jan 2023 11:41:12 +0000
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>>>>>> Nothing good will happen if you try to use the function to
>>>>>>> print two different link speeds in one log message.  
>>>>>> You are right.
>>>>>> And use malloc for "name" will result in memory leakage, which is also
>>>>>> not a good option.
>>>>>>
>>>>>> BTW, do you think if we need to modify the function
>>>>>> "rte_eth_link_speed_to_str"?  
>>>>>
>>>>> IMHO it would be more pain than gain in this case.
>>>>>
>>>>> .
>>>>>  
>>>> Agree with you. Thanks Andrew
>>>>  
>>>
>>> It can be option to update the API as following in next ABI break release:
>>>
>>> const char *
>>> rte_eth_link_speed_to_str(uint32_t link_speed, char *buf, size_t buf_size);
>>>
>>> For this a deprecation notice needs to be sent and approved, not sure
>>> though if it worth.
>>>
>>>
>>> Meanwhile, what do you think to update string 'Invalid' to something
>>> like 'Irregular' or 'Erratic', does this help to convey the right message?
>>
>>
>> API versioning is possible here.
> 
> 
> Agree, ABI versioning can be used here.
> 
> @Connor, what do you think?

Updating patch status as rejected, if you still pursue the feature
please send a separate patch that updates the API via ABI versioning.

Thanks,
ferruh

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-23 11:58  3%             ` fengchengwen
@ 2023-03-23 12:51  3%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-23 12:51 UTC (permalink / raw)
  To: Olivier Matz, Ferruh Yigit, fengchengwen; +Cc: dev, David Marchand

23/03/2023 12:58, fengchengwen:
> On 2023/3/22 21:49, Thomas Monjalon wrote:
> > 22/03/2023 09:53, Ferruh Yigit:
> >> On 3/22/2023 1:15 AM, fengchengwen wrote:
> >>> On 2023/3/21 21:50, Ferruh Yigit wrote:
> >>>> On 3/17/2023 2:43 AM, fengchengwen wrote:
> >>>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
> >>>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
> >>>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
> >>>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
> >>>>>>> parameter 'value' is NULL when parsed 'only keys'.
> >>>>>>>
> >>>>>>> It may leads to segment fault when parse args with 'only key', this 
> >>>>>>> patchset fixes rest of them.
> >>>>>>>
> >>>>>>> Chengwen Feng (5):
> >>>>>>>   app/pdump: fix segment fault when parse args
> >>>>>>>   net/memif: fix segment fault when parse devargs
> >>>>>>>   net/pcap: fix segment fault when parse devargs
> >>>>>>>   net/ring: fix segment fault when parse devargs
> >>>>>>>   net/sfc: fix segment fault when parse devargs
> >>>>>>
> >>>>>> Hi Chengwen,
> >>>>>>
> >>>>>> Did you scan all `rte_kvargs_process()` instances?
> >>>>>
> >>>>> No, I was just looking at the modules I was concerned about.
> >>>>> I looked at it briefly, and some modules had the same problem.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> And if there would be a way to tell kvargs that a value is expected (or
> >>>>>> not) this checks could be done in kvargs layer, I think this also can be
> >>>>>> to look at.
> >>>>>
> >>>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> >>>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> >>>>> But it also break the API's behavior.
> >>>>>
> >>>>
> >>>> What about having a new API, like `rte_kvargs_process_extended()`,
> >>>>
> >>>> That gets an additional flag as parameter, which may have values like
> >>>> following to indicate if key expects a value or not:
> >>>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> >>>> ARG_WITH_VALUE      --> "key=value"
> >>>> ARG_NO_VALUE        --> 'key'
> >>>>
> >>>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> >>>> `rte_kvargs_process()`.
> >>>>
> >>>> This way instead of adding checks, relevant usage can be replaced by
> >>>> `rte_kvargs_process_extended()`, this requires similar amount of change
> >>>> but code will be more clean I think.
> >>>>
> >>>> Do you think does this work?
> >>>
> >>> Yes, it can work.
> >>>
> >>> But I think the introduction of new API adds some complexity.
> >>> And a good API definition could more simpler.
> >>>
> >>
> >> Other option is changing existing API, but that may be widely used and
> >> changing it impacts applications, I don't think it worth.
> > 
> > I've planned a change in kvargs API 5 years ago and never did it:
> >>From doc/guides/rel_notes/deprecation.rst:
> > "
> > * kvargs: The function ``rte_kvargs_process`` will get a new parameter
> >   for returning key match count. It will ease handling of no-match case.
> > "
> 
> I think it's okay to add extra parameter for rte_kvargs_process. But it will
> break ABI.
> Also I notice patchset was deferred in patchwork.
> 
> Does it mean that the new version can't accept until the 23.11 release cycle ?

It is a bit too late to take a decision in 23.03 cycle.
Let's continue this discussion.
We can either have some fixes in 23.07 or have an ABI breaking change in 23.11.



^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22 13:49  0%           ` Thomas Monjalon
@ 2023-03-23 11:58  3%             ` fengchengwen
  2023-03-23 12:51  3%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-23 11:58 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier Matz, Ferruh Yigit; +Cc: dev, David Marchand

On 2023/3/22 21:49, Thomas Monjalon wrote:
> 22/03/2023 09:53, Ferruh Yigit:
>> On 3/22/2023 1:15 AM, fengchengwen wrote:
>>> On 2023/3/21 21:50, Ferruh Yigit wrote:
>>>> On 3/17/2023 2:43 AM, fengchengwen wrote:
>>>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>>>>
>>>>>>> It may leads to segment fault when parse args with 'only key', this 
>>>>>>> patchset fixes rest of them.
>>>>>>>
>>>>>>> Chengwen Feng (5):
>>>>>>>   app/pdump: fix segment fault when parse args
>>>>>>>   net/memif: fix segment fault when parse devargs
>>>>>>>   net/pcap: fix segment fault when parse devargs
>>>>>>>   net/ring: fix segment fault when parse devargs
>>>>>>>   net/sfc: fix segment fault when parse devargs
>>>>>>
>>>>>> Hi Chengwen,
>>>>>>
>>>>>> Did you scan all `rte_kvargs_process()` instances?
>>>>>
>>>>> No, I was just looking at the modules I was concerned about.
>>>>> I looked at it briefly, and some modules had the same problem.
>>>>>
>>>>>>
>>>>>>
>>>>>> And if there would be a way to tell kvargs that a value is expected (or
>>>>>> not) this checks could be done in kvargs layer, I think this also can be
>>>>>> to look at.
>>>>>
>>>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>>>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>>>>> But it also break the API's behavior.
>>>>>
>>>>
>>>> What about having a new API, like `rte_kvargs_process_extended()`,
>>>>
>>>> That gets an additional flag as parameter, which may have values like
>>>> following to indicate if key expects a value or not:
>>>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
>>>> ARG_WITH_VALUE      --> "key=value"
>>>> ARG_NO_VALUE        --> 'key'
>>>>
>>>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
>>>> `rte_kvargs_process()`.
>>>>
>>>> This way instead of adding checks, relevant usage can be replaced by
>>>> `rte_kvargs_process_extended()`, this requires similar amount of change
>>>> but code will be more clean I think.
>>>>
>>>> Do you think does this work?
>>>
>>> Yes, it can work.
>>>
>>> But I think the introduction of new API adds some complexity.
>>> And a good API definition could more simpler.
>>>
>>
>> Other option is changing existing API, but that may be widely used and
>> changing it impacts applications, I don't think it worth.
> 
> I've planned a change in kvargs API 5 years ago and never did it:
>>From doc/guides/rel_notes/deprecation.rst:
> "
> * kvargs: The function ``rte_kvargs_process`` will get a new parameter
>   for returning key match count. It will ease handling of no-match case.
> "

I think it's okay to add extra parameter for rte_kvargs_process. But it will
break ABI.
Also I notice patchset was deferred in patchwork.

Does it mean that the new version can't accept until the 23.11 release cycle ?

> 
>> Of course we can live with as it is and add checks to the callback
>> functions, although I still believe a new 'process()' API is better idea.
> 
> 
> 
> .
> 

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22  8:53  0%         ` Ferruh Yigit
@ 2023-03-22 13:49  0%           ` Thomas Monjalon
  2023-03-23 11:58  3%             ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-22 13:49 UTC (permalink / raw)
  To: fengchengwen, Olivier Matz, Ferruh Yigit; +Cc: dev, David Marchand

22/03/2023 09:53, Ferruh Yigit:
> On 3/22/2023 1:15 AM, fengchengwen wrote:
> > On 2023/3/21 21:50, Ferruh Yigit wrote:
> >> On 3/17/2023 2:43 AM, fengchengwen wrote:
> >>> On 2023/3/17 2:18, Ferruh Yigit wrote:
> >>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
> >>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
> >>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
> >>>>> parameter 'value' is NULL when parsed 'only keys'.
> >>>>>
> >>>>> It may leads to segment fault when parse args with 'only key', this 
> >>>>> patchset fixes rest of them.
> >>>>>
> >>>>> Chengwen Feng (5):
> >>>>>   app/pdump: fix segment fault when parse args
> >>>>>   net/memif: fix segment fault when parse devargs
> >>>>>   net/pcap: fix segment fault when parse devargs
> >>>>>   net/ring: fix segment fault when parse devargs
> >>>>>   net/sfc: fix segment fault when parse devargs
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>> Did you scan all `rte_kvargs_process()` instances?
> >>>
> >>> No, I was just looking at the modules I was concerned about.
> >>> I looked at it briefly, and some modules had the same problem.
> >>>
> >>>>
> >>>>
> >>>> And if there would be a way to tell kvargs that a value is expected (or
> >>>> not) this checks could be done in kvargs layer, I think this also can be
> >>>> to look at.
> >>>
> >>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> >>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> >>> But it also break the API's behavior.
> >>>
> >>
> >> What about having a new API, like `rte_kvargs_process_extended()`,
> >>
> >> That gets an additional flag as parameter, which may have values like
> >> following to indicate if key expects a value or not:
> >> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> >> ARG_WITH_VALUE      --> "key=value"
> >> ARG_NO_VALUE        --> 'key'
> >>
> >> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> >> `rte_kvargs_process()`.
> >>
> >> This way instead of adding checks, relevant usage can be replaced by
> >> `rte_kvargs_process_extended()`, this requires similar amount of change
> >> but code will be more clean I think.
> >>
> >> Do you think does this work?
> > 
> > Yes, it can work.
> > 
> > But I think the introduction of new API adds some complexity.
> > And a good API definition could more simpler.
> > 
> 
> Other option is changing existing API, but that may be widely used and
> changing it impacts applications, I don't think it worth.

I've planned a change in kvargs API 5 years ago and never did it:
From doc/guides/rel_notes/deprecation.rst:
"
* kvargs: The function ``rte_kvargs_process`` will get a new parameter
  for returning key match count. It will ease handling of no-match case.
"

> Of course we can live with as it is and add checks to the callback
> functions, although I still believe a new 'process()' API is better idea.




^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22  1:15  0%       ` fengchengwen
@ 2023-03-22  8:53  0%         ` Ferruh Yigit
  2023-03-22 13:49  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-03-22  8:53 UTC (permalink / raw)
  To: fengchengwen, thomas, Olivier Matz; +Cc: dev, David Marchand

On 3/22/2023 1:15 AM, fengchengwen wrote:
> On 2023/3/21 21:50, Ferruh Yigit wrote:
>> On 3/17/2023 2:43 AM, fengchengwen wrote:
>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>>
>>>>> It may leads to segment fault when parse args with 'only key', this 
>>>>> patchset fixes rest of them.
>>>>>
>>>>> Chengwen Feng (5):
>>>>>   app/pdump: fix segment fault when parse args
>>>>>   net/memif: fix segment fault when parse devargs
>>>>>   net/pcap: fix segment fault when parse devargs
>>>>>   net/ring: fix segment fault when parse devargs
>>>>>   net/sfc: fix segment fault when parse devargs
>>>>
>>>> Hi Chengwen,
>>>>
>>>> Did you scan all `rte_kvargs_process()` instances?
>>>
>>> No, I was just looking at the modules I was concerned about.
>>> I looked at it briefly, and some modules had the same problem.
>>>
>>>>
>>>>
>>>> And if there would be a way to tell kvargs that a value is expected (or
>>>> not) this checks could be done in kvargs layer, I think this also can be
>>>> to look at.
>>>
>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>>> But it also break the API's behavior.
>>>
>>
>> What about having a new API, like `rte_kvargs_process_extended()`,
>>
>> That gets an additional flag as parameter, which may have values like
>> following to indicate if key expects a value or not:
>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
>> ARG_WITH_VALUE      --> "key=value"
>> ARG_NO_VALUE        --> 'key'
>>
>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
>> `rte_kvargs_process()`.
>>
>> This way instead of adding checks, relevant usage can be replaced by
>> `rte_kvargs_process_extended()`, this requires similar amount of change
>> but code will be more clean I think.
>>
>> Do you think does this work?
> 
> Yes, it can work.
> 
> But I think the introduction of new API adds some complexity.
> And a good API definition could more simpler.
> 

Other option is changing existing API, but that may be widely used and
changing it impacts applications, I don't think it worth.

Of course we can live with as it is and add checks to the callback
functions, although I still believe a new 'process()' API is better idea.

>>
>>
>>>
>>> Or continue fix the exist code (about 10+ place more),
>>> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
>>> they'll take the initiative to prevent this.
>>>
>>>
>>> Hope for more advise for the next.
>>>
>>>> .
>>>>
>>
>> .
>>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-21 13:50  0%     ` Ferruh Yigit
@ 2023-03-22  1:15  0%       ` fengchengwen
  2023-03-22  8:53  0%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-22  1:15 UTC (permalink / raw)
  To: Ferruh Yigit, thomas, Olivier Matz; +Cc: dev, David Marchand

On 2023/3/21 21:50, Ferruh Yigit wrote:
> On 3/17/2023 2:43 AM, fengchengwen wrote:
>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>
>>>> It may leads to segment fault when parse args with 'only key', this 
>>>> patchset fixes rest of them.
>>>>
>>>> Chengwen Feng (5):
>>>>   app/pdump: fix segment fault when parse args
>>>>   net/memif: fix segment fault when parse devargs
>>>>   net/pcap: fix segment fault when parse devargs
>>>>   net/ring: fix segment fault when parse devargs
>>>>   net/sfc: fix segment fault when parse devargs
>>>
>>> Hi Chengwen,
>>>
>>> Did you scan all `rte_kvargs_process()` instances?
>>
>> No, I was just looking at the modules I was concerned about.
>> I looked at it briefly, and some modules had the same problem.
>>
>>>
>>>
>>> And if there would be a way to tell kvargs that a value is expected (or
>>> not) this checks could be done in kvargs layer, I think this also can be
>>> to look at.
>>
>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>> But it also break the API's behavior.
>>
> 
> What about having a new API, like `rte_kvargs_process_extended()`,
> 
> That gets an additional flag as parameter, which may have values like
> following to indicate if key expects a value or not:
> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> ARG_WITH_VALUE      --> "key=value"
> ARG_NO_VALUE        --> 'key'
> 
> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> `rte_kvargs_process()`.
> 
> This way instead of adding checks, relevant usage can be replaced by
> `rte_kvargs_process_extended()`, this requires similar amount of change
> but code will be more clean I think.
> 
> Do you think does this work?

Yes, it can work.

But I think the introduction of new API adds some complexity.
And a good API definition could more simpler.

> 
> 
>>
>> Or continue fix the exist code (about 10+ place more),
>> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
>> they'll take the initiative to prevent this.
>>
>>
>> Hope for more advise for the next.
>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-17  2:43  3%   ` fengchengwen
@ 2023-03-21 13:50  0%     ` Ferruh Yigit
  2023-03-22  1:15  0%       ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-03-21 13:50 UTC (permalink / raw)
  To: fengchengwen, thomas, Olivier Matz; +Cc: dev, David Marchand

On 3/17/2023 2:43 AM, fengchengwen wrote:
> On 2023/3/17 2:18, Ferruh Yigit wrote:
>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>> parameter 'value' is NULL when parsed 'only keys'.
>>>
>>> It may leads to segment fault when parse args with 'only key', this 
>>> patchset fixes rest of them.
>>>
>>> Chengwen Feng (5):
>>>   app/pdump: fix segment fault when parse args
>>>   net/memif: fix segment fault when parse devargs
>>>   net/pcap: fix segment fault when parse devargs
>>>   net/ring: fix segment fault when parse devargs
>>>   net/sfc: fix segment fault when parse devargs
>>
>> Hi Chengwen,
>>
>> Did you scan all `rte_kvargs_process()` instances?
> 
> No, I was just looking at the modules I was concerned about.
> I looked at it briefly, and some modules had the same problem.
> 
>>
>>
>> And if there would be a way to tell kvargs that a value is expected (or
>> not) this checks could be done in kvargs layer, I think this also can be
>> to look at.
> 
> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> But it also break the API's behavior.
> 

What about having a new API, like `rte_kvargs_process_extended()`,

That gets an additional flag as parameter, which may have values like
following to indicate if key expects a value or not:
ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
ARG_WITH_VALUE      --> "key=value"
ARG_NO_VALUE        --> 'key'

Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
`rte_kvargs_process()`.

This way instead of adding checks, relevant usage can be replaced by
`rte_kvargs_process_extended()`, this requires similar amount of change
but code will be more clean I think.

Do you think does this work?


> 
> Or continue fix the exist code (about 10+ place more),
> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
> they'll take the initiative to prevent this.
> 
> 
> Hope for more advise for the next.
> 
>> .
>>


^ permalink raw reply	[relevance 0%]

* [PATCH v2 2/2] ci: test compilation with debug in GHA
  @ 2023-03-20 12:18 19%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-20 12:18 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, Michael Santana

We often miss compilation issues with -O0 -g.
Switch to debug in GHA for the gcc job.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- rather than introduce a new job, updated the ABI check job
  to build with debug,

---
 .ci/linux-build.sh          | 8 +++++++-
 .github/workflows/build.yml | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index ab0994388a..150b38bd7a 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -65,6 +65,12 @@ if [ "$RISCV64" = "true" ]; then
     cross_file=config/riscv/riscv64_linux_gcc
 fi
 
+buildtype=debugoptimized
+
+if [ "$BUILD_DEBUG" = "true" ]; then
+    buildtype=debug
+fi
+
 if [ "$BUILD_DOCS" = "true" ]; then
     OPTS="$OPTS -Denable_docs=true"
 fi
@@ -85,7 +91,7 @@ fi
 
 OPTS="$OPTS -Dplatform=generic"
 OPTS="$OPTS -Ddefault_library=$DEF_LIB"
-OPTS="$OPTS -Dbuildtype=debugoptimized"
+OPTS="$OPTS -Dbuildtype=$buildtype"
 OPTS="$OPTS -Dcheck_includes=true"
 if [ "$MINI" = "true" ]; then
     OPTS="$OPTS -Denable_drivers=net/null"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 154be70cc1..bbcb535afb 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -18,6 +18,7 @@ jobs:
       ABI_CHECKS: ${{ contains(matrix.config.checks, 'abi') }}
       ASAN: ${{ contains(matrix.config.checks, 'asan') }}
       BUILD_32BIT: ${{ matrix.config.cross == 'i386' }}
+      BUILD_DEBUG: ${{ contains(matrix.config.checks, 'debug') }}
       BUILD_DOCS: ${{ contains(matrix.config.checks, 'doc') }}
       CC: ccache ${{ matrix.config.compiler }}
       DEF_LIB: ${{ matrix.config.library }}
@@ -39,7 +40,7 @@ jobs:
             mini: mini
           - os: ubuntu-20.04
             compiler: gcc
-            checks: abi+doc+tests
+            checks: abi+debug+doc+tests
           - os: ubuntu-20.04
             compiler: clang
             checks: asan+doc+tests
-- 
2.39.2


^ permalink raw reply	[relevance 19%]

* [PATCH 2/2] ci: test compilation with debug
  @ 2023-03-20 10:26  5% ` David Marchand
    1 sibling, 0 replies; 200+ results
From: David Marchand @ 2023-03-20 10:26 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, Michael Santana

We often miss compilation issues with -O0 -g.
Add a test in GHA.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .ci/linux-build.sh          | 8 +++++++-
 .github/workflows/build.yml | 4 ++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index ab0994388a..150b38bd7a 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -65,6 +65,12 @@ if [ "$RISCV64" = "true" ]; then
     cross_file=config/riscv/riscv64_linux_gcc
 fi
 
+buildtype=debugoptimized
+
+if [ "$BUILD_DEBUG" = "true" ]; then
+    buildtype=debug
+fi
+
 if [ "$BUILD_DOCS" = "true" ]; then
     OPTS="$OPTS -Denable_docs=true"
 fi
@@ -85,7 +91,7 @@ fi
 
 OPTS="$OPTS -Dplatform=generic"
 OPTS="$OPTS -Ddefault_library=$DEF_LIB"
-OPTS="$OPTS -Dbuildtype=debugoptimized"
+OPTS="$OPTS -Dbuildtype=$buildtype"
 OPTS="$OPTS -Dcheck_includes=true"
 if [ "$MINI" = "true" ]; then
     OPTS="$OPTS -Denable_drivers=net/null"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 154be70cc1..d90ecfc6f0 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -18,6 +18,7 @@ jobs:
       ABI_CHECKS: ${{ contains(matrix.config.checks, 'abi') }}
       ASAN: ${{ contains(matrix.config.checks, 'asan') }}
       BUILD_32BIT: ${{ matrix.config.cross == 'i386' }}
+      BUILD_DEBUG: ${{ contains(matrix.config.checks, 'debug') }}
       BUILD_DOCS: ${{ contains(matrix.config.checks, 'doc') }}
       CC: ccache ${{ matrix.config.compiler }}
       DEF_LIB: ${{ matrix.config.library }}
@@ -37,6 +38,9 @@ jobs:
           - os: ubuntu-20.04
             compiler: gcc
             mini: mini
+          - os: ubuntu-20.04
+            compiler: gcc
+            checks: debug
           - os: ubuntu-20.04
             compiler: gcc
             checks: abi+doc+tests
-- 
2.39.2


^ permalink raw reply	[relevance 5%]

* Re: [PATCH 0/5] fix segment fault when parse args
  @ 2023-03-17  2:43  3%   ` fengchengwen
  2023-03-21 13:50  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-17  2:43 UTC (permalink / raw)
  To: Ferruh Yigit, thomas; +Cc: dev, David Marchand

On 2023/3/17 2:18, Ferruh Yigit wrote:
> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>> parameter 'value' is NULL when parsed 'only keys'.
>>
>> It may leads to segment fault when parse args with 'only key', this 
>> patchset fixes rest of them.
>>
>> Chengwen Feng (5):
>>   app/pdump: fix segment fault when parse args
>>   net/memif: fix segment fault when parse devargs
>>   net/pcap: fix segment fault when parse devargs
>>   net/ring: fix segment fault when parse devargs
>>   net/sfc: fix segment fault when parse devargs
> 
> Hi Chengwen,
> 
> Did you scan all `rte_kvargs_process()` instances?

No, I was just looking at the modules I was concerned about.
I looked at it briefly, and some modules had the same problem.

> 
> 
> And if there would be a way to tell kvargs that a value is expected (or
> not) this checks could be done in kvargs layer, I think this also can be
> to look at.

Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
But it also break the API's behavior.

Or continue fix the exist code (about 10+ place more),
for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
they'll take the initiative to prevent this.

Hope for more advise for the next.

> .
> 

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-16 13:10  3%     ` Dongdong Liu
@ 2023-03-16 14:31  0%       ` Ivan Malov
  0 siblings, 0 replies; 200+ results
From: Ivan Malov @ 2023-03-16 14:31 UTC (permalink / raw)
  To: Dongdong Liu
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi,

Thanks for responding and PSB.

On Thu, 16 Mar 2023, Dongdong Liu wrote:

> Hi Ivan
>
> Many thanks for your review.
>
> On 2023/3/15 19:28, Ivan Malov wrote:
>> Hi,
>> 
>> On Wed, 15 Mar 2023, Dongdong Liu wrote:
>> 
>>> From: Jie Hai <haijie1@huawei.com>
>>> 
>>> Currently, rte_eth_rss_conf supports configuring rss hash
>>> functions, rss key and it's length, but not rss hash algorithm.
>>> 
>>> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
>>> "func". This represents the RSS algorithms to apply. The following
>>> API is affected:
>>>     - rte_eth_dev_configure
>>>     - rte_eth_dev_rss_hash_update
>>>     - rte_eth_dev_rss_hash_conf_get
>>> 
>>> To prevent configuration failures caused by incorrect func input, check
>>> this parameter in advance. If it's incorrect, a warning is generated
>>> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
>>> and rte_eth_dev_configure.
>>> 
>>> To check whether the drivers report the func field, it is set to default
>>> value before querying.
>>> 
>>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>>> ---
>>> doc/guides/rel_notes/release_23_03.rst |  4 ++--
>>> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
>>> lib/ethdev/rte_ethdev.h                |  5 +++++
>>> 3 files changed, 25 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>>> b/doc/guides/rel_notes/release_23_03.rst
>>> index af6f37389c..7879567427 100644
>>> --- a/doc/guides/rel_notes/release_23_03.rst
>>> +++ b/doc/guides/rel_notes/release_23_03.rst
>>> @@ -284,8 +284,8 @@ ABI Changes
>>>    Also, make sure to start the actual text at the margin.
>>>    =======================================================
>>> 
>>> -* No ABI change that would break compatibility with 22.11.
>>> -
>>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for
>>> RSS hash
>>> +  algorithm.
>>> 
>>> Known Issues
>>> ------------
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>> index 4d03255683..db561026bd 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>         goto rollback;
>>>     }
>>> 
>>> +    if (dev_conf->rx_adv_conf.rss_conf.func >=
>>> RTE_ETH_HASH_FUNCTION_MAX) {
>>> +        RTE_ETHDEV_LOG(WARNING,
>>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>>> modified to default value (%u)\n",
>>> +            port_id, dev_conf->rx_adv_conf.rss_conf.func,
>>> +            RTE_ETH_HASH_FUNCTION_DEFAULT);
>>> +        dev->data->dev_conf.rx_adv_conf.rss_conf.func =
>>> +            RTE_ETH_HASH_FUNCTION_DEFAULT;
>> 
>> I have no strong opinion, but, to me, this behaviour conceals
>> programming errors. For example, if an application intends
>> to enable hash algorithm A but, due to a programming error,
>> passes a gibberish value here, chances are the error will
>> end up unnoticed. Especially in case the application
>> sets the log level to such that warnings are omitted.
> Good point, will fix.
>> 
>> Why not just return the error the standard way?
>
> Aha, The original intention is not to break the ABI,
> but I think it could not achieve that.
>> 
>>> +    }
>>> +
>>>     /* Check if Rx RSS distribution is disabled but RSS hash is
>>> enabled. */
>>>     if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
>>>         (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
>>> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
>>>         return -ENOTSUP;
>>>     }
>>> 
>>> +    if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
>>> +        RTE_ETHDEV_LOG(NOTICE,
>>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>>> modified to default value (%u)\n",
>>> +            port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
>>> +        rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>>> +    }
>>> +
>>>     if (*dev->dev_ops->rss_hash_update == NULL)
>>>         return -ENOTSUP;
>>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
>>> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
>>>         return -EINVAL;
>>>     }
>>> 
>>> +    rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>>> +
>>>     if (*dev->dev_ops->rss_hash_conf_get == NULL)
>>>         return -ENOTSUP;
>>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index 99fe9e238b..5abe2cb36d 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -174,6 +174,7 @@ extern "C" {
>>> 
>>> #include "rte_ethdev_trace_fp.h"
>>> #include "rte_dev_info.h"
>>> +#include "rte_flow.h"
>>> 
>>> extern int rte_eth_dev_logtype;
>>> 
>>> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>>>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>>>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>>>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
>>> + *
>>> + * The *func* field of the *rss_conf* structure indicates the different
>>> + * types of hash algorithms applied by the RSS hashing.
>> 
>> Consider:
>> 
>> The *func* field of the *rss_conf* structure indicates the algorithm to
>> use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
>> the PMD to use its best-effort algorithm rather than a specific one.
>
> Look at some PMD drivers(i40e, hns3 etc), it seems the 
> RTE_ETH_HASH_FUNCTION_DEFAULT consider as no rss algorithm is set.

This does not seem to contradict the suggested description.

If they, however, treat this as "no RSS at all", then
perhaps it is a mistake, because if the user requests
Rx MQ mode "RSS" and selects algorithm DEFAULT, this
is clearly not the same as "no RSS". Not by a long
shot. Because for "no RSS" the user would have
passed MQ mode choice "NONE", I take it.

>
> Thanks,
> Dongdong
>>
>>>  */
>>> struct rte_eth_rss_conf {
>>>     uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
>>>     uint8_t rss_key_len; /**< hash key length in bytes. */
>>>     uint64_t rss_hf;     /**< Hash functions to apply - see below. */
>>> +    enum rte_eth_hash_function func;    /**< Hash algorithm to apply. */
>>> };
>>> 
>>> /*
>>> --
>>> 2.22.0
>>> 
>>> 
>> 
>> Thank you.
>> 
>> .
>> 
>

Thank you.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 13:43  3%   ` Thomas Monjalon
@ 2023-03-16 13:16  3%     ` Dongdong Liu
  0 siblings, 0 replies; 200+ results
From: Dongdong Liu @ 2023-03-16 13:16 UTC (permalink / raw)
  To: Thomas Monjalon, Jie Hai
  Cc: dev, ferruh.yigit, andrew.rybchenko, reshma.pattan, stable,
	yisen.zhuang, david.marchand

Hi Thomas
On 2023/3/15 21:43, Thomas Monjalon wrote:
> 15/03/2023 12:00, Dongdong Liu:
>> From: Jie Hai <haijie1@huawei.com>
>> --- a/doc/guides/rel_notes/release_23_03.rst
>> +++ b/doc/guides/rel_notes/release_23_03.rst
>> -* No ABI change that would break compatibility with 22.11.
>> -
>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
>> +  algorithm.
>
> We cannot break ABI compatibility until 23.11.
Got it. Thank you for reminding.

[PATCH 3/5] and [PATCH 4/5] do not relate with this ABI compatibility.
I will send them separately.

Thanks,
Dongdong
>
>
>
> .
>

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:28  0%   ` Ivan Malov
@ 2023-03-16 13:10  3%     ` Dongdong Liu
  2023-03-16 14:31  0%       ` Ivan Malov
  0 siblings, 1 reply; 200+ results
From: Dongdong Liu @ 2023-03-16 13:10 UTC (permalink / raw)
  To: Ivan Malov
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi Ivan

Many thanks for your review.

On 2023/3/15 19:28, Ivan Malov wrote:
> Hi,
>
> On Wed, 15 Mar 2023, Dongdong Liu wrote:
>
>> From: Jie Hai <haijie1@huawei.com>
>>
>> Currently, rte_eth_rss_conf supports configuring rss hash
>> functions, rss key and it's length, but not rss hash algorithm.
>>
>> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
>> "func". This represents the RSS algorithms to apply. The following
>> API is affected:
>>     - rte_eth_dev_configure
>>     - rte_eth_dev_rss_hash_update
>>     - rte_eth_dev_rss_hash_conf_get
>>
>> To prevent configuration failures caused by incorrect func input, check
>> this parameter in advance. If it's incorrect, a warning is generated
>> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
>> and rte_eth_dev_configure.
>>
>> To check whether the drivers report the func field, it is set to default
>> value before querying.
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>> ---
>> doc/guides/rel_notes/release_23_03.rst |  4 ++--
>> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
>> lib/ethdev/rte_ethdev.h                |  5 +++++
>> 3 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>> b/doc/guides/rel_notes/release_23_03.rst
>> index af6f37389c..7879567427 100644
>> --- a/doc/guides/rel_notes/release_23_03.rst
>> +++ b/doc/guides/rel_notes/release_23_03.rst
>> @@ -284,8 +284,8 @@ ABI Changes
>>    Also, make sure to start the actual text at the margin.
>>    =======================================================
>>
>> -* No ABI change that would break compatibility with 22.11.
>> -
>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for
>> RSS hash
>> +  algorithm.
>>
>> Known Issues
>> ------------
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index 4d03255683..db561026bd 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id,
>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>         goto rollback;
>>     }
>>
>> +    if (dev_conf->rx_adv_conf.rss_conf.func >=
>> RTE_ETH_HASH_FUNCTION_MAX) {
>> +        RTE_ETHDEV_LOG(WARNING,
>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>> modified to default value (%u)\n",
>> +            port_id, dev_conf->rx_adv_conf.rss_conf.func,
>> +            RTE_ETH_HASH_FUNCTION_DEFAULT);
>> +        dev->data->dev_conf.rx_adv_conf.rss_conf.func =
>> +            RTE_ETH_HASH_FUNCTION_DEFAULT;
>
> I have no strong opinion, but, to me, this behaviour conceals
> programming errors. For example, if an application intends
> to enable hash algorithm A but, due to a programming error,
> passes a gibberish value here, chances are the error will
> end up unnoticed. Especially in case the application
> sets the log level to such that warnings are omitted.
Good point, will fix.
>
> Why not just return the error the standard way?

Aha, The original intention is not to break the ABI,
but I think it could not achieve that.
>
>> +    }
>> +
>>     /* Check if Rx RSS distribution is disabled but RSS hash is
>> enabled. */
>>     if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
>>         (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
>> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
>>         return -ENOTSUP;
>>     }
>>
>> +    if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
>> +        RTE_ETHDEV_LOG(NOTICE,
>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>> modified to default value (%u)\n",
>> +            port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
>> +        rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>> +    }
>> +
>>     if (*dev->dev_ops->rss_hash_update == NULL)
>>         return -ENOTSUP;
>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
>> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
>>         return -EINVAL;
>>     }
>>
>> +    rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>> +
>>     if (*dev->dev_ops->rss_hash_conf_get == NULL)
>>         return -ENOTSUP;
>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 99fe9e238b..5abe2cb36d 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -174,6 +174,7 @@ extern "C" {
>>
>> #include "rte_ethdev_trace_fp.h"
>> #include "rte_dev_info.h"
>> +#include "rte_flow.h"
>>
>> extern int rte_eth_dev_logtype;
>>
>> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
>> + *
>> + * The *func* field of the *rss_conf* structure indicates the different
>> + * types of hash algorithms applied by the RSS hashing.
>
> Consider:
>
> The *func* field of the *rss_conf* structure indicates the algorithm to
> use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
> the PMD to use its best-effort algorithm rather than a specific one.

Look at some PMD drivers(i40e, hns3 etc), it seems the 
RTE_ETH_HASH_FUNCTION_DEFAULT consider as no rss algorithm is set.

Thanks,
Dongdong
>
>>  */
>> struct rte_eth_rss_conf {
>>     uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
>>     uint8_t rss_key_len; /**< hash key length in bytes. */
>>     uint64_t rss_hf;     /**< Hash functions to apply - see below. */
>> +    enum rte_eth_hash_function func;    /**< Hash algorithm to apply. */
>> };
>>
>> /*
>> --
>> 2.22.0
>>
>>
>
> Thank you.
>
> .
>

^ permalink raw reply	[relevance 3%]

* [RFC v2 0/2] Add high-performance timer facility
  2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
  2023-02-28 16:01  0% ` Morten Brørup
@ 2023-03-15 17:03  3% ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2023-03-15 17:03 UTC (permalink / raw)
  To: dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark,
	Stefan Sundkvist, Stephen Hemminger, Morten Brørup,
	Tyler Retzlaff, Mattias Rönnblom

This patchset is an attempt to introduce a high-performance, highly
scalable timer facility into DPDK.

More specifically, the goals for the htimer library are:

* Efficient handling of a handful up to hundreds of thousands of
  concurrent timers.
* Make adding and canceling timers low-overhead, constant-time
  operations.
* Provide a service functionally equivalent to that of
  <rte_timer.h>. API/ABI backward compatibility is secondary.

In the author's opinion, there are two main shortcomings with the
current DPDK timer library (i.e., rte_timer.[ch]).

One is the synchronization overhead, where heavy-weight full-barrier
type synchronization is used. rte_timer.c uses per-EAL/lcore skip
lists, but any thread may add or cancel (or otherwise access) timers
managed by another lcore (and thus resides in its timer skip list).

The other is an algorithmic shortcoming, with rte_timer.c's reliance
on a skip list, which is less efficient than certain alternatives.

This patchset implements a hierarchical timer wheel (HWT, in
rte_htw.c), as per the Varghese and Lauck paper "Hashed and
Hierarchical Timing Wheels: Data Structures for the Efficient
Implementation of a Timer Facility". A HWT is a data structure
purposely design for this task, and used by many operating system
kernel timer facilities.

To further improve the solution described by Varghese and Lauck, a
bitset is placed in front of each of the timer wheel in the HWT,
reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
and expiry processing).

Cycle-efficient scanning and manipulation of these bitsets are crucial
for the HWT's performance.

The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
instance, much like rte_timer.c keeps a per-lcore skip list.

To avoid expensive synchronization overhead for thread-local timer
management, the HWTs are accessed only from the "owning" thread.  Any
interaction any other thread does with a particular lcore's timer
wheel goes over a set of DPDK rings. A side-effect of this design is
that all operations working toward a "remote" HWT must be
asynchronous.

The <rte_htimer.h> API is available only to EAL threads and registered
non-EAL threads.

The htimer API allows the application to supply the current time,
useful in case it already has retrieved this for other purposes,
saving the cost of a rdtsc instruction (or its equivalent).

Relative htimer does not retrieve a new time, but reuse the current
time (as known via/at-the-time of the manage-call), again to shave off
some cycles of overhead.

A semantic improvement compared to the <rte_timer.h> API is that the
htimer library can give a definite answer on the question if the timer
expiry callback was called, after a timer has been canceled.

The patchset includes a performance test case
'timer_htimer_htw_perf_autotest', which compares rte_timer, rte_htimer
and rte_htw timers in the same scenario.

'timer_htimer_htw_perf_autotest' suggests that rte_htimer is ~3-5x
faster than rte_timer for timer/timeout-heavy applications, in a
scenario where the timer always fires. For a scenario with a mix of
canceled and expired timers, the performance difference is greater.

In scenarios with few timeouts, rte_timer has lower overhead than
htimer, but both variants consume very little CPU time.

In certain scenarios, rte_timer does not suffer from
non-constant-time-add and cancel operations. On such is in case the
timer added is always last in the list, where htimer is only ~2-3x
faster.

The bitset implementation which the HWT implementation depends upon
seemed generic-enough and potentially useful outside the world of
HWTs, to justify being located in the EAL.

This patchset is very much an RFC, and the author is yet to form an
opinion on many important issues.

* If deemed a suitable replacement, should the htimer replace the
  current DPDK timer library in some particular (ABI-breaking)
  release, or should it live side-by-side with the then-legacy
  <rte_timer.h> API? A lot of things in and outside DPDK depend on
  <rte_timer.h>, so coexistence may be required to facilitate a smooth
  transition.

* Should the htimer and htw-related files be colocated with rte_timer.c
  in the timer library?

* Would it be useful for applications using asynchronous cancel to
  have the option of having the timer callback run not only in case of
  timer expiration, but also cancellation (on the target lcore)? The
  timer cb signature would need to include an additional parameter in
  that case.

* Should the rte_htimer be a nested struct, so the htw parts be separated
  from the htimer parts?

* <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
  <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
  be so?

* rte_htimer struct is only supposed to be used by the application to
  give an indication of how much memory it needs to allocate, and is
  its member are not supposed to be directly accessed (w/ the possible
  exception of the owner_lcore_id field). Should there be a dummy
  struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
  function instead, serving the same purpose? Better encapsulation,
  but more inconvenient for applications. Run-time dynamic sizing
  would force application-level dynamic allocations.

* Asynchronous cancellation is a little tricky to use for the
  application (primarily due to timer memory reclamation/race
  issues). Should this functionality be removed?

* Should rte_htimer_mgr_init() also retrieve the current time? If so,
  there should to be a variant which allows the user to specify the
  time (to match rte_htimer_mgr_manage_time()). One pitfall with the
  current proposed API is an application calling rte_htimer_mgr_init()
  and then immediately adding a timer with a relative timeout, in
  which case the current absolute time used is 0, which might be a
  surprise.

* Would the event timer adapter be best off using <rte_htw.h>
  directly, or <rte_htimer.h>? In the latter case, there needs to be a
  way to instantiate more HWTs (similar to the "alt" functions of
  <rte_timer.h>)?

* Should the PERIODICAL flag (and the complexity it brings) be
  removed? And leave the application with only single-shot timers, and
  the option to re-add them in the timer callback.

* Should the async result codes and the sync cancel error codes be merged
  into one set of result codes?

* Should the rte_htimer_mgr_async_add() have a flag which allow
  buffering add request messages until rte_htimer_mgr_process() is
  called? Or any manage function. Would reduce ring signaling overhead
  (i.e., burst enqueue operations instead of single-element
  enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
  solving the same "problem" a different way. (The signature of such
  a function would not be pretty.)

* Does the functionality provided by the rte_htimer_mgr_process()
  function match its the use cases? Should there me a more clear
  separation between expiry processing and asynchronous operation
  processing?

* Should the patchset be split into more commits? If so, how?

Thanks to Erik Carrillo for his assistance.

Mattias Rönnblom (2):
  eal: add bitset type
  eal: add high-performance timer facility

 app/test/meson.build                  |  12 +-
 app/test/test_bitset.c                | 645 +++++++++++++++++++
 app/test/test_htimer_mgr.c            | 674 ++++++++++++++++++++
 app/test/test_htimer_mgr_perf.c       | 322 ++++++++++
 app/test/test_htw.c                   | 478 ++++++++++++++
 app/test/test_htw_perf.c              | 181 ++++++
 app/test/test_timer_htimer_htw_perf.c | 693 ++++++++++++++++++++
 doc/api/doxy-api-index.md             |   5 +-
 doc/api/doxy-api.conf.in              |   1 +
 lib/eal/common/meson.build            |   1 +
 lib/eal/common/rte_bitset.c           |  29 +
 lib/eal/include/meson.build           |   1 +
 lib/eal/include/rte_bitset.h          | 879 ++++++++++++++++++++++++++
 lib/eal/version.map                   |   3 +
 lib/htimer/meson.build                |   7 +
 lib/htimer/rte_htimer.h               |  68 ++
 lib/htimer/rte_htimer_mgr.c           | 547 ++++++++++++++++
 lib/htimer/rte_htimer_mgr.h           | 516 +++++++++++++++
 lib/htimer/rte_htimer_msg.h           |  44 ++
 lib/htimer/rte_htimer_msg_ring.c      |  18 +
 lib/htimer/rte_htimer_msg_ring.h      |  55 ++
 lib/htimer/rte_htw.c                  | 445 +++++++++++++
 lib/htimer/rte_htw.h                  |  49 ++
 lib/htimer/version.map                |  17 +
 lib/meson.build                       |   1 +
 25 files changed, 5689 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_bitset.c
 create mode 100644 app/test/test_htimer_mgr.c
 create mode 100644 app/test/test_htimer_mgr_perf.c
 create mode 100644 app/test/test_htw.c
 create mode 100644 app/test/test_htw_perf.c
 create mode 100644 app/test/test_timer_htimer_htw_perf.c
 create mode 100644 lib/eal/common/rte_bitset.c
 create mode 100644 lib/eal/include/rte_bitset.h
 create mode 100644 lib/htimer/meson.build
 create mode 100644 lib/htimer/rte_htimer.h
 create mode 100644 lib/htimer/rte_htimer_mgr.c
 create mode 100644 lib/htimer/rte_htimer_mgr.h
 create mode 100644 lib/htimer/rte_htimer_msg.h
 create mode 100644 lib/htimer/rte_htimer_msg_ring.c
 create mode 100644 lib/htimer/rte_htimer_msg_ring.h
 create mode 100644 lib/htimer/rte_htw.c
 create mode 100644 lib/htimer/rte_htw.h
 create mode 100644 lib/htimer/version.map

-- 
2.34.1

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
  2023-03-15 11:28  0%   ` Ivan Malov
@ 2023-03-15 13:43  3%   ` Thomas Monjalon
  2023-03-16 13:16  3%     ` Dongdong Liu
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-15 13:43 UTC (permalink / raw)
  To: Dongdong Liu, Jie Hai
  Cc: dev, ferruh.yigit, andrew.rybchenko, reshma.pattan, stable,
	yisen.zhuang, david.marchand

15/03/2023 12:00, Dongdong Liu:
> From: Jie Hai <haijie1@huawei.com>
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> -* No ABI change that would break compatibility with 22.11.
> -
> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
> +  algorithm.

We cannot break ABI compatibility until 23.11.




^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
@ 2023-03-15 11:28  0%   ` Ivan Malov
  2023-03-16 13:10  3%     ` Dongdong Liu
  2023-03-15 13:43  3%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Ivan Malov @ 2023-03-15 11:28 UTC (permalink / raw)
  To: Dongdong Liu
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi,

On Wed, 15 Mar 2023, Dongdong Liu wrote:

> From: Jie Hai <haijie1@huawei.com>
>
> Currently, rte_eth_rss_conf supports configuring rss hash
> functions, rss key and it's length, but not rss hash algorithm.
>
> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
> "func". This represents the RSS algorithms to apply. The following
> API is affected:
> 	- rte_eth_dev_configure
> 	- rte_eth_dev_rss_hash_update
> 	- rte_eth_dev_rss_hash_conf_get
>
> To prevent configuration failures caused by incorrect func input, check
> this parameter in advance. If it's incorrect, a warning is generated
> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
> and rte_eth_dev_configure.
>
> To check whether the drivers report the func field, it is set to default
> value before querying.
>
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
> ---
> doc/guides/rel_notes/release_23_03.rst |  4 ++--
> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
> lib/ethdev/rte_ethdev.h                |  5 +++++
> 3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index af6f37389c..7879567427 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -284,8 +284,8 @@ ABI Changes
>    Also, make sure to start the actual text at the margin.
>    =======================================================
>
> -* No ABI change that would break compatibility with 22.11.
> -
> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
> +  algorithm.
>
> Known Issues
> ------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 4d03255683..db561026bd 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
> 		goto rollback;
> 	}
>
> +	if (dev_conf->rx_adv_conf.rss_conf.func >= RTE_ETH_HASH_FUNCTION_MAX) {
> +		RTE_ETHDEV_LOG(WARNING,
> +			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
> +			port_id, dev_conf->rx_adv_conf.rss_conf.func,
> +			RTE_ETH_HASH_FUNCTION_DEFAULT);
> +		dev->data->dev_conf.rx_adv_conf.rss_conf.func =
> +			RTE_ETH_HASH_FUNCTION_DEFAULT;

I have no strong opinion, but, to me, this behaviour conceals
programming errors. For example, if an application intends
to enable hash algorithm A but, due to a programming error,
passes a gibberish value here, chances are the error will
end up unnoticed. Especially in case the application
sets the log level to such that warnings are omitted.

Why not just return the error the standard way?

> +	}
> +
> 	/* Check if Rx RSS distribution is disabled but RSS hash is enabled. */
> 	if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
> 	    (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
> 		return -ENOTSUP;
> 	}
>
> +	if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
> +		RTE_ETHDEV_LOG(NOTICE,
> +			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
> +			port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
> +		rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
> +	}
> +
> 	if (*dev->dev_ops->rss_hash_update == NULL)
> 		return -ENOTSUP;
> 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
> 		return -EINVAL;
> 	}
>
> +	rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
> +
> 	if (*dev->dev_ops->rss_hash_conf_get == NULL)
> 		return -ENOTSUP;
> 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 99fe9e238b..5abe2cb36d 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -174,6 +174,7 @@ extern "C" {
>
> #include "rte_ethdev_trace_fp.h"
> #include "rte_dev_info.h"
> +#include "rte_flow.h"
>
> extern int rte_eth_dev_logtype;
>
> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
> + *
> + * The *func* field of the *rss_conf* structure indicates the different
> + * types of hash algorithms applied by the RSS hashing.

Consider:

The *func* field of the *rss_conf* structure indicates the algorithm to
use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
the PMD to use its best-effort algorithm rather than a specific one.

>  */
> struct rte_eth_rss_conf {
> 	uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
> 	uint8_t rss_key_len; /**< hash key length in bytes. */
> 	uint64_t rss_hf;     /**< Hash functions to apply - see below. */
> +	enum rte_eth_hash_function func;	/**< Hash algorithm to apply. */
> };
>
> /*
> -- 
> 2.22.0
>
>

Thank you.

^ permalink raw reply	[relevance 0%]

* [PATCH 1/5] ethdev: support setting and querying rss algorithm
  @ 2023-03-15 11:00 10% ` Dongdong Liu
  2023-03-15 11:28  0%   ` Ivan Malov
  2023-03-15 13:43  3%   ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Dongdong Liu @ 2023-03-15 11:00 UTC (permalink / raw)
  To: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan
  Cc: stable, yisen.zhuang, liudongdong3, Jie Hai

From: Jie Hai <haijie1@huawei.com>

Currently, rte_eth_rss_conf supports configuring rss hash
functions, rss key and it's length, but not rss hash algorithm.

The structure ``rte_eth_rss_conf`` is extended by adding a new field,
"func". This represents the RSS algorithms to apply. The following
API is affected:
	- rte_eth_dev_configure
	- rte_eth_dev_rss_hash_update
	- rte_eth_dev_rss_hash_conf_get

To prevent configuration failures caused by incorrect func input, check
this parameter in advance. If it's incorrect, a warning is generated
and the default value is set. Do the same for rte_eth_dev_rss_hash_update
and rte_eth_dev_configure.

To check whether the drivers report the func field, it is set to default
value before querying.

Signed-off-by: Jie Hai <haijie1@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
---
 doc/guides/rel_notes/release_23_03.rst |  4 ++--
 lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
 lib/ethdev/rte_ethdev.h                |  5 +++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index af6f37389c..7879567427 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -284,8 +284,8 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
-* No ABI change that would break compatibility with 22.11.
-
+* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
+  algorithm.
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 4d03255683..db561026bd 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 		goto rollback;
 	}
 
+	if (dev_conf->rx_adv_conf.rss_conf.func >= RTE_ETH_HASH_FUNCTION_MAX) {
+		RTE_ETHDEV_LOG(WARNING,
+			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
+			port_id, dev_conf->rx_adv_conf.rss_conf.func,
+			RTE_ETH_HASH_FUNCTION_DEFAULT);
+		dev->data->dev_conf.rx_adv_conf.rss_conf.func =
+			RTE_ETH_HASH_FUNCTION_DEFAULT;
+	}
+
 	/* Check if Rx RSS distribution is disabled but RSS hash is enabled. */
 	if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
 	    (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
@@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
 		return -ENOTSUP;
 	}
 
+	if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
+		RTE_ETHDEV_LOG(NOTICE,
+			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
+			port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
+		rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
+	}
+
 	if (*dev->dev_ops->rss_hash_update == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
@@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
 		return -EINVAL;
 	}
 
+	rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
+
 	if (*dev->dev_ops->rss_hash_conf_get == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 99fe9e238b..5abe2cb36d 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -174,6 +174,7 @@ extern "C" {
 
 #include "rte_ethdev_trace_fp.h"
 #include "rte_dev_info.h"
+#include "rte_flow.h"
 
 extern int rte_eth_dev_logtype;
 
@@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
  * The *rss_hf* field of the *rss_conf* structure indicates the different
  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
  * Supplying an *rss_hf* equal to zero disables the RSS feature.
+ *
+ * The *func* field of the *rss_conf* structure indicates the different
+ * types of hash algorithms applied by the RSS hashing.
  */
 struct rte_eth_rss_conf {
 	uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
 	uint8_t rss_key_len; /**< hash key length in bytes. */
 	uint64_t rss_hf;     /**< Hash functions to apply - see below. */
+	enum rte_eth_hash_function func;	/**< Hash algorithm to apply. */
 };
 
 /*
-- 
2.22.0


^ permalink raw reply	[relevance 10%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 13:10  0%             ` Bruce Richardson
@ 2023-03-13 15:51  0%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-13 15:51 UTC (permalink / raw)
  To: fengchengwen, Bruce Richardson
  Cc: dev, dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 14:10, Bruce Richardson:
> On Thu, Mar 09, 2023 at 01:12:51PM +0100, Thomas Monjalon wrote:
> > 09/03/2023 12:23, fengchengwen:
> > > On 2023/3/9 15:29, Thomas Monjalon wrote:
> > > > 09/03/2023 02:43, fengchengwen:
> > > >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > > >>> --- a/doc/guides/rel_notes/release_22_11.rst
> > > >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> > > >>> @@ -504,7 +504,7 @@ ABI Changes
> > > >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> > > >>>  
> > > >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > > >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > > >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> > > >>
> > > >> Should add to release 23.03 rst.
> > > > 
> > > > Yes we could add a note in API changes.
> > > > 
> > > >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > > > 
> > > > Yes it was not a good idea to rename in the release notes.
> > > > 
> > > >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > > >>> -    build = false
> > > >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> > > >>> +if not get_option('enable_iova_as_pa')
> > > >>>      subdir_done()
> > > >>>  endif
> > > >>
> > > >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> > > >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> > > >>      subdir_done()
> > > >> endif
> > > > 
> > > > Why testing the C macro in Meson?
> > > > It looks simpler to check the Meson option in Meson.
> > > 
> > > The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> > > It can be regarded as alias of enable_iova_as_pa.
> > 
> > It is not strictly an alias, because it can be overriden via CFLAGS.
> > 
> > > This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> > > and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'
> > 
> > To me, using Meson option in Meson files is more obvious.
> > 
> > Bruce, what do you think?
> > 
> 
> I'm not sure it matters much! However, I think of the two, using the
> reference to IOVA_IN_MBUF is clearer. It also allows the same terminology
> to be used in meson and C files. If we don't want to do a dpdk_conf lookup,
> we can always assign the option to a meson variable called iova_in_mbuf.

OK I'll query the C macro in the Meson files.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib/hash: new feature adding existing key
  @ 2023-03-13 15:48  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-03-13 15:48 UTC (permalink / raw)
  To: Abdullah Ömer Yamaç; +Cc: dev, Yipeng Wang

On Mon, 13 Mar 2023 07:35:48 +0000
Abdullah Ömer Yamaç <omer.yamac@ceng.metu.edu.tr> wrote:

> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index eb2644f74b..e8b7283ec2 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -193,6 +193,8 @@ struct rte_hash {
>  	/**< If read-write concurrency support is enabled */
>  	uint8_t ext_table_support;     /**< Enable extendable bucket table */
>  	uint8_t no_free_on_del;
> +	/**< If update is prohibited on adding same key */
> +	uint8_t no_update_data;
>  	/**< If key index should be freed on calling rte_hash_del_xxx APIs.
>  	 * If this is set, rte_hash_free_key_with_position must be called to
>  	 * free the key index associated with the deleted entry.
> diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h

This ends up being an ABI change. So needs to wait for 23.11 release

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] reorder: fix registration of dynamic field in mbuf
  @ 2023-03-13 10:19  3% ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-13 10:19 UTC (permalink / raw)
  To: Volodymyr Fialko, Reshma Pattan
  Cc: dev, Andrew Rybchenko, jerinj, anoobj, Thomas Monjalon

Hello,

On Mon, Mar 13, 2023 at 10:35 AM Volodymyr Fialko <vfialko@marvell.com> wrote:
>
> It's possible to initialize reorder buffer with user allocated memory via
> rte_reorder_init() function. In such case rte_reorder_create() not required
> and reorder dynamic field in rte_mbuf will be not registered.

Good catch.


>
> Fixes: 01f3496695b5 ("reorder: switch sequence number to dynamic mbuf field")

It seems worth backporting.
Cc: stable@dpdk.org

>
> Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
> ---
>  lib/reorder/rte_reorder.c | 40 ++++++++++++++++++++++++++++++---------
>  1 file changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/lib/reorder/rte_reorder.c b/lib/reorder/rte_reorder.c
> index 6e029c9e02..a759a9c434 100644
> --- a/lib/reorder/rte_reorder.c
> +++ b/lib/reorder/rte_reorder.c
> @@ -54,6 +54,28 @@ struct rte_reorder_buffer {
>  static void
>  rte_reorder_free_mbufs(struct rte_reorder_buffer *b);
>
> +static int
> +rte_reorder_dynf_register(void)
> +{
> +       int ret;
> +
> +       static const struct rte_mbuf_dynfield reorder_seqn_dynfield_desc = {
> +               .name = RTE_REORDER_SEQN_DYNFIELD_NAME,
> +               .size = sizeof(rte_reorder_seqn_t),
> +               .align = __alignof__(rte_reorder_seqn_t),
> +       };
> +
> +       if (rte_reorder_seqn_dynfield_offset > 0)
> +               return 0;
> +
> +       ret = rte_mbuf_dynfield_register(&reorder_seqn_dynfield_desc);
> +       if (ret < 0)
> +               return ret;
> +       rte_reorder_seqn_dynfield_offset = ret;
> +
> +       return 0;
> +}

We don't need this helper (see my comment below, for
rte_reorder_create), you can simply move this block to
rte_reorder_init().


> +
>  struct rte_reorder_buffer *
>  rte_reorder_init(struct rte_reorder_buffer *b, unsigned int bufsize,
>                 const char *name, unsigned int size)
> @@ -85,6 +107,12 @@ rte_reorder_init(struct rte_reorder_buffer *b, unsigned int bufsize,
>                 rte_errno = EINVAL;
>                 return NULL;
>         }
> +       if (rte_reorder_dynf_register()) {
> +               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence"
> +                                     " number\n");
> +               rte_errno = ENOMEM;

I think returning this new errno code is fine from a ABI pov.
An application would have to check for NULL return code in any case
and can't act differently based on rte_errno value.

However, this is a small change to the rte_reorder_init API, so it
needs some update, see:

 * @return
 *   The initialized reorder buffer instance, or NULL on error
 *   On error case, rte_errno will be set appropriately:
 *    - EINVAL - invalid parameters



> +               return NULL;
> +       }
>
>         memset(b, 0, bufsize);
>         strlcpy(b->name, name, sizeof(b->name));
> @@ -106,11 +134,6 @@ rte_reorder_create(const char *name, unsigned socket_id, unsigned int size)
>         struct rte_reorder_list *reorder_list;
>         const unsigned int bufsize = sizeof(struct rte_reorder_buffer) +
>                                         (2 * size * sizeof(struct rte_mbuf *));
> -       static const struct rte_mbuf_dynfield reorder_seqn_dynfield_desc = {
> -               .name = RTE_REORDER_SEQN_DYNFIELD_NAME,
> -               .size = sizeof(rte_reorder_seqn_t),
> -               .align = __alignof__(rte_reorder_seqn_t),
> -       };
>
>         reorder_list = RTE_TAILQ_CAST(rte_reorder_tailq.head, rte_reorder_list);
>
> @@ -128,10 +151,9 @@ rte_reorder_create(const char *name, unsigned socket_id, unsigned int size)
>                 return NULL;
>         }
>
> -       rte_reorder_seqn_dynfield_offset =
> -               rte_mbuf_dynfield_register(&reorder_seqn_dynfield_desc);
> -       if (rte_reorder_seqn_dynfield_offset < 0) {
> -               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence number\n");
> +       if (rte_reorder_dynf_register()) {
> +               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence"
> +                                     " number\n");

All rte_reorder_buffer objects need to go through rte_reorder_init().
You can check rte_reorder_init() return code.


>                 rte_errno = ENOMEM;
>                 return NULL;
>         }
> --
> 2.34.1
>


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 12:12  0%           ` Thomas Monjalon
@ 2023-03-09 13:10  0%             ` Bruce Richardson
  2023-03-13 15:51  0%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-03-09 13:10 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: fengchengwen, dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

On Thu, Mar 09, 2023 at 01:12:51PM +0100, Thomas Monjalon wrote:
> 09/03/2023 12:23, fengchengwen:
> > On 2023/3/9 15:29, Thomas Monjalon wrote:
> > > 09/03/2023 02:43, fengchengwen:
> > >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > >>> --- a/doc/guides/rel_notes/release_22_11.rst
> > >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> > >>> @@ -504,7 +504,7 @@ ABI Changes
> > >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> > >>>  
> > >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> > >>
> > >> Should add to release 23.03 rst.
> > > 
> > > Yes we could add a note in API changes.
> > > 
> > >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > > 
> > > Yes it was not a good idea to rename in the release notes.
> > > 
> > >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > >>> -    build = false
> > >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> > >>> +if not get_option('enable_iova_as_pa')
> > >>>      subdir_done()
> > >>>  endif
> > >>
> > >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> > >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> > >>      subdir_done()
> > >> endif
> > > 
> > > Why testing the C macro in Meson?
> > > It looks simpler to check the Meson option in Meson.
> > 
> > The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> > It can be regarded as alias of enable_iova_as_pa.
> 
> It is not strictly an alias, because it can be overriden via CFLAGS.
> 
> > This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> > and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'
> 
> To me, using Meson option in Meson files is more obvious.
> 
> Bruce, what do you think?
> 

I'm not sure it matters much! However, I think of the two, using the
reference to IOVA_IN_MBUF is clearer. It also allows the same terminology
to be used in meson and C files. If we don't want to do a dpdk_conf lookup,
we can always assign the option to a meson variable called iova_in_mbuf.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 11:23  0%         ` fengchengwen
@ 2023-03-09 12:12  0%           ` Thomas Monjalon
  2023-03-09 13:10  0%             ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-09 12:12 UTC (permalink / raw)
  To: Bruce Richardson, fengchengwen
  Cc: dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 12:23, fengchengwen:
> On 2023/3/9 15:29, Thomas Monjalon wrote:
> > 09/03/2023 02:43, fengchengwen:
> >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -504,7 +504,7 @@ ABI Changes
> >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> >>>  
> >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> >>
> >> Should add to release 23.03 rst.
> > 
> > Yes we could add a note in API changes.
> > 
> >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > 
> > Yes it was not a good idea to rename in the release notes.
> > 
> >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> >>> -    build = false
> >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> >>> +if not get_option('enable_iova_as_pa')
> >>>      subdir_done()
> >>>  endif
> >>
> >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> >>      subdir_done()
> >> endif
> > 
> > Why testing the C macro in Meson?
> > It looks simpler to check the Meson option in Meson.
> 
> The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> It can be regarded as alias of enable_iova_as_pa.

It is not strictly an alias, because it can be overriden via CFLAGS.

> This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'

To me, using Meson option in Meson files is more obvious.

Bruce, what do you think?

> >> Meson build 0.63.0 already support deprecated a option by a new option.
> >> When update to the new meson verion, the drivers' meson.build will not be modified.
> > 
> > I don't understand this comment.
> 
> I mean: the option "enable_iova_as_pa" need deprecated future.

Why deprecating this option?

> Based on this, I think we should limit 'enable_iova_as_pa' usage scope, this allows us to
> reduce the amount of change effort when it's about to deprecated.

I don't plan to deprecate this option.
And in general, we should avoid deprecating a compilation option.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09  7:29  0%       ` Thomas Monjalon
@ 2023-03-09 11:23  0%         ` fengchengwen
  2023-03-09 12:12  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-09 11:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, David Marchand, Bruce Richardson, Qi Zhang,
	Morten Brørup, Shijith Thotton, Olivier Matz, Ruifeng Wang,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Jingjing Wu, Beilei Xing, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal



On 2023/3/9 15:29, Thomas Monjalon wrote:
> 09/03/2023 02:43, fengchengwen:
>> On 2023/3/7 0:13, Thomas Monjalon wrote:
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -504,7 +504,7 @@ ABI Changes
>>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
>>>  
>>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
>>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
>>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
>>
>> Should add to release 23.03 rst.
> 
> Yes we could add a note in API changes.
> 
>> The original 22.11 still have RTE_IOVA_AS_PA definition.
> 
> Yes it was not a good idea to rename in the release notes.
> 
>>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
>>> -    build = false
>>> -    reason = 'driver does not support disabling IOVA as PA mode'
>>> +if not get_option('enable_iova_as_pa')
>>>      subdir_done()
>>>  endif
>>
>> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
>> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
>>      subdir_done()
>> endif
> 
> Why testing the C macro in Meson?
> It looks simpler to check the Meson option in Meson.

The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
It can be regarded as alias of enable_iova_as_pa.

This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'

> 
>> Meson build 0.63.0 already support deprecated a option by a new option.
>> When update to the new meson verion, the drivers' meson.build will not be modified.
> 
> I don't understand this comment.

I mean: the option "enable_iova_as_pa" need deprecated future.

Based on this, I think we should limit 'enable_iova_as_pa' usage scope, this allows us to
reduce the amount of change effort when it's about to deprecated.

> 
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* [RFC 1/2] security: introduce out of place support for inline ingress
@ 2023-03-09  8:56  4% Nithin Dabilpuram
  0 siblings, 0 replies; 200+ results
From: Nithin Dabilpuram @ 2023-03-09  8:56 UTC (permalink / raw)
  To: Thomas Monjalon, Akhil Goyal; +Cc: jerinj, dev, Nithin Dabilpuram

Similar to out of place(OOP) processing support that exists for
Lookaside crypto/security sessions, Inline ingress security
sessions may also need out of place processing in usecases
where original encrypted packet needs to be retained for post
processing. So for NIC's which have such a kind of HW support,
a new SA option is provided to indicate whether OOP needs to
be enabled on that Inline ingress security session or not.

Since for inline ingress sessions, packet is not received by
CPU until the processing is done, we can only have per-SA
option and not per-packet option like Lookaside sessions.

In order to return the original encrypted packet mbuf,
this patch adds a new mbuf dynamic field of 8B size
containing pointer to original mbuf which will be populated
for packets associated with Inline SA that has OOP enabled.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 devtools/libabigail.abignore       |  4 +++
 lib/security/rte_security.c        | 17 +++++++++++++
 lib/security/rte_security.h        | 39 +++++++++++++++++++++++++++++-
 lib/security/rte_security_driver.h |  8 ++++++
 lib/security/version.map           |  2 ++
 5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..9f52ffbf2e 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -34,3 +34,7 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; Ignore change to reserved opts for new SA option
+[suppress_type]
+       name = rte_security_ipsec_sa_options
diff --git a/lib/security/rte_security.c b/lib/security/rte_security.c
index e102c55e55..c2199dd8db 100644
--- a/lib/security/rte_security.c
+++ b/lib/security/rte_security.c
@@ -27,7 +27,10 @@
 } while (0)
 
 #define RTE_SECURITY_DYNFIELD_NAME "rte_security_dynfield_metadata"
+#define RTE_SECURITY_OOP_DYNFIELD_NAME "rte_security_oop_dynfield_metadata"
+
 int rte_security_dynfield_offset = -1;
+int rte_security_oop_dynfield_offset = -1;
 
 int
 rte_security_dynfield_register(void)
@@ -42,6 +45,20 @@ rte_security_dynfield_register(void)
 	return rte_security_dynfield_offset;
 }
 
+int
+rte_security_oop_dynfield_register(void)
+{
+	static const struct rte_mbuf_dynfield dynfield_desc = {
+		.name = RTE_SECURITY_OOP_DYNFIELD_NAME,
+		.size = sizeof(rte_security_oop_dynfield_t),
+		.align = __alignof__(rte_security_oop_dynfield_t),
+	};
+
+	rte_security_oop_dynfield_offset =
+		rte_mbuf_dynfield_register(&dynfield_desc);
+	return rte_security_oop_dynfield_offset;
+}
+
 void *
 rte_security_session_create(struct rte_security_ctx *instance,
 			    struct rte_security_session_conf *conf,
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 4bacf9fcd9..866cd4e8ee 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
 	 */
 	uint32_t ip_reassembly_en : 1;
 
+	/** Enable out of place processing on inline inbound packets.
+	 *
+	 * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
+	 *      inbound SA if supported by driver. PMD need to register mbuf
+	 *      dynamic field using rte_security_oop_dynfield_register()
+	 *      and security session creation would fail if dynfield is not
+	 *      registered successfully.
+	 * * 0: Disable OOP processing for this session (default).
+	 */
+	uint32_t ingress_oop : 1;
+
 	/** Reserved bit fields for future extension
 	 *
 	 * User should ensure reserved_opts is cleared as it may change in
@@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
 	 *
 	 * Note: Reduce number of bits in reserved_opts for every new option.
 	 */
-	uint32_t reserved_opts : 17;
+	uint32_t reserved_opts : 16;
 };
 
 /** IPSec security association direction */
@@ -812,6 +823,13 @@ typedef uint64_t rte_security_dynfield_t;
 /** Dynamic mbuf field for device-specific metadata */
 extern int rte_security_dynfield_offset;
 
+/** Out-of-Place(OOP) processing field type */
+typedef struct rte_mbuf *rte_security_oop_dynfield_t;
+/** Dynamic mbuf field for pointer to original mbuf for
+ * OOP processing session.
+ */
+extern int rte_security_oop_dynfield_offset;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -834,6 +852,25 @@ rte_security_dynfield(struct rte_mbuf *mbuf)
 		rte_security_dynfield_t *);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get pointer to mbuf field for original mbuf pointer when
+ * Out-Of-Place(OOP) processing is enabled in security session.
+ *
+ * @param       mbuf    packet to access
+ * @return pointer to mbuf field
+ */
+__rte_experimental
+static inline rte_security_oop_dynfield_t *
+rte_security_oop_dynfield(struct rte_mbuf *mbuf)
+{
+	return RTE_MBUF_DYNFIELD(mbuf,
+			rte_security_oop_dynfield_offset,
+			rte_security_oop_dynfield_t *);
+}
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 421e6f7780..91e7786ab7 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -190,6 +190,14 @@ typedef int (*security_macsec_sa_stats_get_t)(void *device, uint16_t sa_id,
 __rte_internal
 int rte_security_dynfield_register(void);
 
+/**
+ * @internal
+ * Register mbuf dynamic field for Security inline ingress Out-of-Place(OOP)
+ * processing.
+ */
+__rte_internal
+int rte_security_oop_dynfield_register(void);
+
 /**
  * Update the mbuf with provided metadata.
  *
diff --git a/lib/security/version.map b/lib/security/version.map
index 07dcce9ffb..59a95f40bd 100644
--- a/lib/security/version.map
+++ b/lib/security/version.map
@@ -23,10 +23,12 @@ EXPERIMENTAL {
 	rte_security_macsec_sc_stats_get;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_oop_dynfield_offset;
 };
 
 INTERNAL {
 	global:
 
 	rte_security_dynfield_register;
+	rte_security_oop_dynfield_register;
 };
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09  1:43  0%     ` fengchengwen
@ 2023-03-09  7:29  0%       ` Thomas Monjalon
  2023-03-09 11:23  0%         ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-09  7:29 UTC (permalink / raw)
  To: fengchengwen
  Cc: dev, David Marchand, Bruce Richardson, Qi Zhang,
	Morten Brørup, Shijith Thotton, Olivier Matz, Ruifeng Wang,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Jingjing Wu, Beilei Xing, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 02:43, fengchengwen:
> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -504,7 +504,7 @@ ABI Changes
> >    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> >  
> >  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> 
> Should add to release 23.03 rst.

Yes we could add a note in API changes.

> The original 22.11 still have RTE_IOVA_AS_PA definition.

Yes it was not a good idea to rename in the release notes.

> > -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > -    build = false
> > -    reason = 'driver does not support disabling IOVA as PA mode'
> > +if not get_option('enable_iova_as_pa')
> >      subdir_done()
> >  endif
> 
> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
>      subdir_done()
> endif

Why testing the C macro in Meson?
It looks simpler to check the Meson option in Meson.

> Meson build 0.63.0 already support deprecated a option by a new option.
> When update to the new meson verion, the drivers' meson.build will not be modified.

I don't understand this comment.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-06 16:13  2%   ` [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf Thomas Monjalon
@ 2023-03-09  1:43  0%     ` fengchengwen
  2023-03-09  7:29  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-09  1:43 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: David Marchand, Bruce Richardson, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

On 2023/3/7 0:13, Thomas Monjalon wrote:
> The impact of the option "enable_iova_as_pa" is explained for users.
> 
> Also the code flag "RTE_IOVA_AS_PA" is renamed as "RTE_IOVA_IN_MBUF"
> in order to be more accurate (IOVA mode is decided at runtime),
> and more readable in the code.
> 
> Similarly the drivers are using the variable "require_iova_in_mbuf"
> instead of "pmd_supports_disable_iova_as_pa" with an opposite meaning.
> By default, it is assumed that drivers require the IOVA field in mbuf.
> The drivers which support removing this field have to declare themselves.
> 
> If the option "enable_iova_as_pa" is disabled, the unsupported drivers
> will be listed with the new reason text "requires IOVA in mbuf".
> 
> Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---

...

>  compile_time_cpuflags = []
>  subdir(arch_subdir)
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 91414573bd..c67c2823a2 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -504,7 +504,7 @@ ABI Changes
>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
>  
>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.

Should add to release 23.03 rst.
The original 22.11 still have RTE_IOVA_AS_PA definition.

...

> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
> index e1a5afa2ec..743fae9db7 100644
> --- a/drivers/net/hns3/meson.build
> +++ b/drivers/net/hns3/meson.build
> @@ -13,9 +13,7 @@ if arch_subdir != 'x86' and arch_subdir != 'arm' or not dpdk_conf.get('RTE_ARCH_
>      subdir_done()
>  endif
>  
> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> -    build = false
> -    reason = 'driver does not support disabling IOVA as PA mode'
> +if not get_option('enable_iova_as_pa')
>      subdir_done()
>  endif

Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
     subdir_done()
endif
Meson build 0.63.0 already support deprecated a option by a new option.
When update to the new meson verion, the drivers' meson.build will not be modified.

>  
> diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h
> index e69e23997f..dacb87dcb0 100644

...

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02 13:58  0%           ` Jerin Jacob
@ 2023-03-07  8:26  0%             ` Yan, Zhirun
  0 siblings, 0 replies; 200+ results
From: Yan, Zhirun @ 2023-03-07  8:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, March 2, 2023 9:58 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 27, 2023 6:23 AM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > > ndabilpuram@marvell.com; Liang, Cunming
> > > > > <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> > > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker
> > > > > model APIs
> > > > >
> > > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan
> > > > > <zhirun.yan@intel.com>
> > > wrote:
> > > > > >
> > > > > > Add new get/set APIs to configure graph worker model which is
> > > > > > used to determine which model will be chosen.
> > > > > >
> > > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > > ---
> > > > > >  lib/graph/rte_graph_worker.h        | 51
> > > +++++++++++++++++++++++++++++
> > > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > > >  lib/graph/version.map               |  3 ++
> > > > > >  3 files changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > > 100644
> > > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > > @@ -1,5 +1,56 @@
> > > > > >  #include "rte_graph_model_rtc.h"
> > > > > >
> > > > > > +static enum rte_graph_worker_model worker_model =
> > > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > > >
> > > > > This will break the multiprocess.
> > > >
> > > > Thanks. I will use TLS for per-thread local storage.
> > >
> > > If it needs to be used from secondary process, then it needs to be
> > > from memzone.
> > >
> >
> >
> > This filed will be set by primary process in initial stage, and then lcore will only
> read it.
> > I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It
> > seems not necessary to allocate from memzone.
> >
> > >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +/** Graph worker models */
> > > > > > +enum rte_graph_worker_model { #define WORKER_MODEL_DEFAULT
> > > > > > +"default"
> > > > >
> > > > > Why need strings?
> > > > > Also, every symbol in a public header file should start with
> > > > > RTE_ to avoid namespace conflict.
> > > >
> > > > It was used to config the model in app. I can put the string into example.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define
> WORKER_MODEL_RTC
> > > > > > +"rtc"
> > > > > > +       RTE_GRAPH_MODEL_RTC,
> > > > >
> > > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > > enum
> > > > > itself.
> > > > Yes, will do in next version.
> > > >
> > > > >
> > > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > > >
> > > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > > RTE_GRAPH_MODEL_PIPELINE
> > > >
> > > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> > >
> > > Hybrid is very overloaded term, and it will be confusing
> > > (considering there will be new models in future).
> > > Please pick a word that really express the model working.
> > >
> >
> > In this case, the path is Node0 -> Node1 -> Node2 -> Node3 And Node1
> > and Node3 are binding with one core.
> >
> > Our model offers the ability to dispatch between cores.
> >
> > Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?
> 
> Some names, What I can think of
> 
> // MCORE->MULTI CORE
> 
> RTE_GRAPH_MODEL_MCORE_PIPELINE
> or
> RTE_GRAG_MODEL_MCORE_DISPATCH
> or
> RTE_GRAG_MODEL_MCORE_RING
> or
> RTE_GRAPH_MODEL_MULTI_CORE
> 

Thanks, I will use RTE_GRAG_MODEL_MCORE_DISPATCH as the name.

> >
> > + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> > '            '     '                          '     '            '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > '            '     '     |                    '     '      ^     '
> > + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> >                          |                                 |
> >                          + - - - - - - - - - - - - - - - - +
> >
> >
> > > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > > +       RTE_GRAPH_MODEL_MAX,
> > > > >
> > > > > No need for MAX, it will break the ABI for future. See other
> > > > > subsystem such as cryptodev.
> > > >
> > > > Thanks, I will change it.
> > > > >
> > > > > > +};
> > > > >
> > > > > >

^ permalink raw reply	[relevance 0%]

* [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  @ 2023-03-06 16:13  2%   ` Thomas Monjalon
  2023-03-09  1:43  0%     ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-06 16:13 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Bruce Richardson, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Chengwen Feng, Kevin Laatz,
	Pavan Nikhilesh, Mattias Rönnblom, Liang Ma, Peter Mccarthy,
	Jerin Jacob, Harry van Haaren, Artem V. Andreev,
	Andrew Rybchenko, Ashwin Sekhar T K, John W. Linville,
	Ciara Loftus, Chas Williams, Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

The impact of the option "enable_iova_as_pa" is explained for users.

Also the code flag "RTE_IOVA_AS_PA" is renamed as "RTE_IOVA_IN_MBUF"
in order to be more accurate (IOVA mode is decided at runtime),
and more readable in the code.

Similarly the drivers are using the variable "require_iova_in_mbuf"
instead of "pmd_supports_disable_iova_as_pa" with an opposite meaning.
By default, it is assumed that drivers require the IOVA field in mbuf.
The drivers which support removing this field have to declare themselves.

If the option "enable_iova_as_pa" is disabled, the unsupported drivers
will be listed with the new reason text "requires IOVA in mbuf".

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test/test_mbuf.c                   |  2 +-
 config/arm/meson.build                 |  4 ++--
 config/meson.build                     |  2 +-
 doc/guides/rel_notes/release_22_11.rst |  2 +-
 drivers/common/cnxk/meson.build        |  2 +-
 drivers/common/iavf/meson.build        |  2 +-
 drivers/crypto/armv8/meson.build       |  2 +-
 drivers/crypto/cnxk/meson.build        |  2 +-
 drivers/crypto/ipsec_mb/meson.build    |  2 +-
 drivers/crypto/null/meson.build        |  2 +-
 drivers/crypto/openssl/meson.build     |  2 +-
 drivers/dma/cnxk/meson.build           |  2 +-
 drivers/dma/skeleton/meson.build       |  2 +-
 drivers/event/cnxk/meson.build         |  2 +-
 drivers/event/dsw/meson.build          |  2 +-
 drivers/event/opdl/meson.build         |  2 +-
 drivers/event/skeleton/meson.build     |  2 +-
 drivers/event/sw/meson.build           |  2 +-
 drivers/mempool/bucket/meson.build     |  2 +-
 drivers/mempool/cnxk/meson.build       |  2 +-
 drivers/mempool/ring/meson.build       |  2 +-
 drivers/mempool/stack/meson.build      |  2 +-
 drivers/meson.build                    |  6 +++---
 drivers/net/af_packet/meson.build      |  2 +-
 drivers/net/af_xdp/meson.build         |  2 +-
 drivers/net/bonding/meson.build        |  2 +-
 drivers/net/cnxk/meson.build           |  2 +-
 drivers/net/failsafe/meson.build       |  2 +-
 drivers/net/hns3/meson.build           |  4 +---
 drivers/net/ice/ice_rxtx_common_avx.h  | 12 ++++++------
 drivers/net/ice/ice_rxtx_vec_sse.c     |  4 ++--
 drivers/net/ice/meson.build            |  2 +-
 drivers/net/memif/meson.build          |  2 +-
 drivers/net/null/meson.build           |  2 +-
 drivers/net/pcap/meson.build           |  2 +-
 drivers/net/ring/meson.build           |  2 +-
 drivers/net/tap/meson.build            |  2 +-
 drivers/raw/cnxk_bphy/meson.build      |  2 +-
 drivers/raw/cnxk_gpio/meson.build      |  2 +-
 drivers/raw/skeleton/meson.build       |  2 +-
 lib/eal/linux/eal.c                    |  2 +-
 lib/mbuf/rte_mbuf.c                    |  2 +-
 lib/mbuf/rte_mbuf.h                    |  4 ++--
 lib/mbuf/rte_mbuf_core.h               |  8 ++++----
 lib/mbuf/rte_mbuf_dyn.c                |  2 +-
 lib/meson.build                        |  2 +-
 meson_options.txt                      |  2 +-
 47 files changed, 60 insertions(+), 62 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 6cbb03b0af..81a6632d11 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -1232,7 +1232,7 @@ test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 		return -1;
 	}
 
-	if (RTE_IOVA_AS_PA) {
+	if (RTE_IOVA_IN_MBUF) {
 		badbuf = *buf;
 		rte_mbuf_iova_set(&badbuf, 0);
 		if (verify_mbuf_check_panics(&badbuf)) {
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 451dbada7d..5ff66248de 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -319,7 +319,7 @@ soc_cn10k = {
         ['RTE_MAX_LCORE', 24],
         ['RTE_MAX_NUMA_NODES', 1],
         ['RTE_MEMPOOL_ALIGN', 128],
-        ['RTE_IOVA_AS_PA', 0]
+        ['RTE_IOVA_IN_MBUF', 0]
     ],
     'part_number': '0xd49',
     'extra_march_features': ['crypto'],
@@ -412,7 +412,7 @@ soc_cn9k = {
     'part_number': '0xb2',
     'numa': false,
     'flags': [
-        ['RTE_IOVA_AS_PA', 0]
+        ['RTE_IOVA_IN_MBUF', 0]
     ]
 }
 
diff --git a/config/meson.build b/config/meson.build
index fc3ac99a32..fa730a1b14 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -316,7 +316,7 @@ endif
 if get_option('mbuf_refcnt_atomic')
     dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
 endif
-dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
+dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
 
 compile_time_cpuflags = []
 subdir(arch_subdir)
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 91414573bd..c67c2823a2 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -504,7 +504,7 @@ ABI Changes
   ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
 
 * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
-  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
+  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
 
 * ethdev: enum ``RTE_FLOW_ITEM`` was affected by deprecation procedure.
 
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..ce71f3d70c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -87,4 +87,4 @@ sources += files('cnxk_telemetry_bphy.c',
 )
 
 deps += ['bus_pci', 'net', 'telemetry']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/common/iavf/meson.build b/drivers/common/iavf/meson.build
index af8a4983e0..af26955772 100644
--- a/drivers/common/iavf/meson.build
+++ b/drivers/common/iavf/meson.build
@@ -6,4 +6,4 @@ sources = files('iavf_adminq.c', 'iavf_common.c', 'iavf_impl.c')
 if cc.has_argument('-Wno-pointer-to-int-cast')
         cflags += '-Wno-pointer-to-int-cast'
 endif
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/armv8/meson.build b/drivers/crypto/armv8/meson.build
index 700fb80eb2..a735eb511c 100644
--- a/drivers/crypto/armv8/meson.build
+++ b/drivers/crypto/armv8/meson.build
@@ -17,4 +17,4 @@ endif
 ext_deps += dep
 deps += ['bus_vdev']
 sources = files('rte_armv8_pmd.c', 'rte_armv8_pmd_ops.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/cnxk/meson.build b/drivers/crypto/cnxk/meson.build
index a5acabab2b..3d9a0dbbf0 100644
--- a/drivers/crypto/cnxk/meson.build
+++ b/drivers/crypto/cnxk/meson.build
@@ -32,4 +32,4 @@ else
     cflags += [ '-ULA_IPSEC_DEBUG','-UCNXK_CRYPTODEV_DEBUG' ]
 endif
 
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/ipsec_mb/meson.build b/drivers/crypto/ipsec_mb/meson.build
index ec147d2110..3057e6fd10 100644
--- a/drivers/crypto/ipsec_mb/meson.build
+++ b/drivers/crypto/ipsec_mb/meson.build
@@ -41,4 +41,4 @@ sources = files(
         'pmd_zuc.c',
 )
 deps += ['bus_vdev', 'net', 'security']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/null/meson.build b/drivers/crypto/null/meson.build
index 59a7508f18..2e8b05ad28 100644
--- a/drivers/crypto/null/meson.build
+++ b/drivers/crypto/null/meson.build
@@ -9,4 +9,4 @@ endif
 
 deps += 'bus_vdev'
 sources = files('null_crypto_pmd.c', 'null_crypto_pmd_ops.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/openssl/meson.build b/drivers/crypto/openssl/meson.build
index d165c32ae8..1ec63c216d 100644
--- a/drivers/crypto/openssl/meson.build
+++ b/drivers/crypto/openssl/meson.build
@@ -15,4 +15,4 @@ endif
 deps += 'bus_vdev'
 sources = files('rte_openssl_pmd.c', 'rte_openssl_pmd_ops.c')
 ext_deps += dep
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/dma/cnxk/meson.build b/drivers/dma/cnxk/meson.build
index 252e5ff78b..b868fb14cb 100644
--- a/drivers/dma/cnxk/meson.build
+++ b/drivers/dma/cnxk/meson.build
@@ -3,4 +3,4 @@
 
 deps += ['bus_pci', 'common_cnxk', 'dmadev']
 sources = files('cnxk_dmadev.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/dma/skeleton/meson.build b/drivers/dma/skeleton/meson.build
index 2b0422ce61..77055683ad 100644
--- a/drivers/dma/skeleton/meson.build
+++ b/drivers/dma/skeleton/meson.build
@@ -5,4 +5,4 @@ deps += ['dmadev', 'kvargs', 'ring', 'bus_vdev']
 sources = files(
         'skeleton_dmadev.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/cnxk/meson.build b/drivers/event/cnxk/meson.build
index aa42ab3a90..3517e79341 100644
--- a/drivers/event/cnxk/meson.build
+++ b/drivers/event/cnxk/meson.build
@@ -479,4 +479,4 @@ foreach flag: extra_flags
 endforeach
 
 deps += ['bus_pci', 'common_cnxk', 'net_cnxk', 'crypto_cnxk']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/dsw/meson.build b/drivers/event/dsw/meson.build
index e6808c0f71..01af94165f 100644
--- a/drivers/event/dsw/meson.build
+++ b/drivers/event/dsw/meson.build
@@ -6,4 +6,4 @@ if cc.has_argument('-Wno-format-nonliteral')
     cflags += '-Wno-format-nonliteral'
 endif
 sources = files('dsw_evdev.c', 'dsw_event.c', 'dsw_xstats.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/opdl/meson.build b/drivers/event/opdl/meson.build
index 7abef44609..8613b2a746 100644
--- a/drivers/event/opdl/meson.build
+++ b/drivers/event/opdl/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'opdl_test.c',
 )
 deps += ['bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/skeleton/meson.build b/drivers/event/skeleton/meson.build
index fa6a5e0a9f..6e788cfcee 100644
--- a/drivers/event/skeleton/meson.build
+++ b/drivers/event/skeleton/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('skeleton_eventdev.c')
 deps += ['bus_pci', 'bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/sw/meson.build b/drivers/event/sw/meson.build
index 8d815dfa84..3a3ebd72a3 100644
--- a/drivers/event/sw/meson.build
+++ b/drivers/event/sw/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'sw_evdev.c',
 )
 deps += ['hash', 'bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/bucket/meson.build b/drivers/mempool/bucket/meson.build
index 94c060904b..d0ec523237 100644
--- a/drivers/mempool/bucket/meson.build
+++ b/drivers/mempool/bucket/meson.build
@@ -12,4 +12,4 @@ if is_windows
 endif
 
 sources = files('rte_mempool_bucket.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/cnxk/meson.build b/drivers/mempool/cnxk/meson.build
index d8bcc41ca0..50856ecde8 100644
--- a/drivers/mempool/cnxk/meson.build
+++ b/drivers/mempool/cnxk/meson.build
@@ -17,4 +17,4 @@ sources = files(
 )
 
 deps += ['eal', 'mbuf', 'kvargs', 'bus_pci', 'common_cnxk', 'mempool']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/ring/meson.build b/drivers/mempool/ring/meson.build
index 65d203d4b7..a25e9ebc16 100644
--- a/drivers/mempool/ring/meson.build
+++ b/drivers/mempool/ring/meson.build
@@ -2,4 +2,4 @@
 # Copyright(c) 2017 Intel Corporation
 
 sources = files('rte_mempool_ring.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/stack/meson.build b/drivers/mempool/stack/meson.build
index 961e90fc04..95f69042ae 100644
--- a/drivers/mempool/stack/meson.build
+++ b/drivers/mempool/stack/meson.build
@@ -4,4 +4,4 @@
 sources = files('rte_mempool_stack.c')
 
 deps += ['stack']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/meson.build b/drivers/meson.build
index 0618c31a69..2aefa146a7 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -109,7 +109,7 @@ foreach subpath:subdirs
         ext_deps = []
         pkgconfig_extra_libs = []
         testpmd_sources = []
-        pmd_supports_disable_iova_as_pa = false
+        require_iova_in_mbuf = true
 
         if not enable_drivers.contains(drv_path)
             build = false
@@ -127,9 +127,9 @@ foreach subpath:subdirs
             # pull in driver directory which should update all the local variables
             subdir(drv_path)
 
-            if dpdk_conf.get('RTE_IOVA_AS_PA') == 0 and not pmd_supports_disable_iova_as_pa and not always_enable.contains(drv_path)
+            if not get_option('enable_iova_as_pa') and require_iova_in_mbuf and not always_enable.contains(drv_path)
                 build = false
-                reason = 'driver does not support disabling IOVA as PA mode'
+                reason = 'requires IOVA in mbuf'
             endif
 
             # get dependency objs from strings
diff --git a/drivers/net/af_packet/meson.build b/drivers/net/af_packet/meson.build
index bab008d083..f45e4491d4 100644
--- a/drivers/net/af_packet/meson.build
+++ b/drivers/net/af_packet/meson.build
@@ -6,4 +6,4 @@ if not is_linux
     reason = 'only supported on Linux'
 endif
 sources = files('rte_eth_af_packet.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
index 979b914bb6..9a8dbb4d49 100644
--- a/drivers/net/af_xdp/meson.build
+++ b/drivers/net/af_xdp/meson.build
@@ -71,4 +71,4 @@ if build
   endif
 endif
 
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/bonding/meson.build b/drivers/net/bonding/meson.build
index 29022712cb..83326c0d63 100644
--- a/drivers/net/bonding/meson.build
+++ b/drivers/net/bonding/meson.build
@@ -22,4 +22,4 @@ deps += 'sched' # needed for rte_bitmap.h
 deps += ['ip_frag']
 
 headers = files('rte_eth_bond.h', 'rte_eth_bond_8023ad.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index c7ca24d437..c1da121a15 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -195,4 +195,4 @@ foreach flag: extra_flags
 endforeach
 
 headers = files('rte_pmd_cnxk.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/failsafe/meson.build b/drivers/net/failsafe/meson.build
index bf8f791984..513de17535 100644
--- a/drivers/net/failsafe/meson.build
+++ b/drivers/net/failsafe/meson.build
@@ -27,4 +27,4 @@ sources = files(
         'failsafe_ops.c',
         'failsafe_rxtx.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
index e1a5afa2ec..743fae9db7 100644
--- a/drivers/net/hns3/meson.build
+++ b/drivers/net/hns3/meson.build
@@ -13,9 +13,7 @@ if arch_subdir != 'x86' and arch_subdir != 'arm' or not dpdk_conf.get('RTE_ARCH_
     subdir_done()
 endif
 
-if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
-    build = false
-    reason = 'driver does not support disabling IOVA as PA mode'
+if not get_option('enable_iova_as_pa')
     subdir_done()
 endif
 
diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h
index e69e23997f..dacb87dcb0 100644
--- a/drivers/net/ice/ice_rxtx_common_avx.h
+++ b/drivers/net/ice/ice_rxtx_common_avx.h
@@ -54,7 +54,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 		mb0 = rxep[0].mbuf;
 		mb1 = rxep[1].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 				offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -62,7 +62,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
 		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* convert pa to dma_addr hdr/data */
 		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
 		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
@@ -105,7 +105,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			mb6 = rxep[6].mbuf;
 			mb7 = rxep[7].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 					offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -142,7 +142,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 				_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
 						   vaddr6_7, 1);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* convert pa to dma_addr hdr/data */
 			dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3, vaddr0_3);
 			dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7, vaddr4_7);
@@ -177,7 +177,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			mb2 = rxep[2].mbuf;
 			mb3 = rxep[3].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 					offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -198,7 +198,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2),
 							vaddr3, 1);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* convert pa to dma_addr hdr/data */
 			dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1);
 			dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3);
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 72dfd58308..71fdd6ffb5 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -68,7 +68,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
 		mb0 = rxep[0].mbuf;
 		mb1 = rxep[1].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 				 offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -76,7 +76,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
 		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
 		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* convert pa to dma_addr hdr/data */
 		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
 		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
diff --git a/drivers/net/ice/meson.build b/drivers/net/ice/meson.build
index 123b190f72..5e90afcb9b 100644
--- a/drivers/net/ice/meson.build
+++ b/drivers/net/ice/meson.build
@@ -78,4 +78,4 @@ sources += files(
         'ice_dcf_parent.c',
         'ice_dcf_sched.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
index 28416a982f..b890984b46 100644
--- a/drivers/net/memif/meson.build
+++ b/drivers/net/memif/meson.build
@@ -12,4 +12,4 @@ sources = files(
 )
 
 deps += ['hash']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/null/meson.build b/drivers/net/null/meson.build
index 4a483955a7..076b9937c1 100644
--- a/drivers/net/null/meson.build
+++ b/drivers/net/null/meson.build
@@ -8,4 +8,4 @@ if is_windows
 endif
 
 sources = files('rte_eth_null.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/pcap/meson.build b/drivers/net/pcap/meson.build
index a5a2971f0e..de2a70ef0b 100644
--- a/drivers/net/pcap/meson.build
+++ b/drivers/net/pcap/meson.build
@@ -15,4 +15,4 @@ ext_deps += pcap_dep
 if is_windows
     ext_deps += cc.find_library('iphlpapi', required: true)
 endif
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/ring/meson.build b/drivers/net/ring/meson.build
index 72792e26b0..2cd0e97e56 100644
--- a/drivers/net/ring/meson.build
+++ b/drivers/net/ring/meson.build
@@ -9,4 +9,4 @@ endif
 
 sources = files('rte_eth_ring.c')
 headers = files('rte_eth_ring.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 4c9a9eac2b..b07ce68e48 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -35,4 +35,4 @@ foreach arg:args
     config.set(arg[0], cc.has_header_symbol(arg[1], arg[2]))
 endforeach
 configure_file(output : 'tap_autoconf.h', configuration : config)
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/cnxk_bphy/meson.build b/drivers/raw/cnxk_bphy/meson.build
index ffb0ee6b7e..bb5d2ffb80 100644
--- a/drivers/raw/cnxk_bphy/meson.build
+++ b/drivers/raw/cnxk_bphy/meson.build
@@ -10,4 +10,4 @@ sources = files(
         'cnxk_bphy_irq.c',
 )
 headers = files('rte_pmd_bphy.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/cnxk_gpio/meson.build b/drivers/raw/cnxk_gpio/meson.build
index f52a7be9eb..9d9a527392 100644
--- a/drivers/raw/cnxk_gpio/meson.build
+++ b/drivers/raw/cnxk_gpio/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'cnxk_gpio_selftest.c',
 )
 headers = files('rte_pmd_cnxk_gpio.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/skeleton/meson.build b/drivers/raw/skeleton/meson.build
index bfb8fd8bcc..9d5fcf6514 100644
--- a/drivers/raw/skeleton/meson.build
+++ b/drivers/raw/skeleton/meson.build
@@ -6,4 +6,4 @@ sources = files(
         'skeleton_rawdev.c',
         'skeleton_rawdev_test.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index fabafbc39b..e39b6643ee 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1134,7 +1134,7 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_iova_mode() == RTE_IOVA_PA && !RTE_IOVA_AS_PA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_PA && !RTE_IOVA_IN_MBUF) {
 		rte_eal_init_alert("Cannot use IOVA as 'PA' as it is disabled during build");
 		rte_errno = EINVAL;
 		return -1;
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index cfd8062f1e..686e797c80 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -388,7 +388,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		*reason = "bad mbuf pool";
 		return -1;
 	}
-	if (RTE_IOVA_AS_PA && rte_mbuf_iova_get(m) == 0) {
+	if (RTE_IOVA_IN_MBUF && rte_mbuf_iova_get(m) == 0) {
 		*reason = "bad IO addr";
 		return -1;
 	}
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 3a82eb136d..bc41eac10d 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -146,7 +146,7 @@ static inline uint16_t rte_pktmbuf_priv_size(struct rte_mempool *mp);
 static inline rte_iova_t
 rte_mbuf_iova_get(const struct rte_mbuf *m)
 {
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	return m->buf_iova;
 #else
 	return (rte_iova_t)m->buf_addr;
@@ -164,7 +164,7 @@ rte_mbuf_iova_get(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_iova_set(struct rte_mbuf *m, rte_iova_t iova)
 {
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	m->buf_iova = iova;
 #else
 	RTE_SET_USED(m);
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index a30e1e0eaf..dfffb6e5e6 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -466,11 +466,11 @@ struct rte_mbuf {
 	RTE_MARKER cacheline0;
 
 	void *buf_addr;           /**< Virtual address of segment buffer. */
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	/**
 	 * Physical address of segment buffer.
 	 * This field is undefined if the build is configured to use only
-	 * virtual address as IOVA (i.e. RTE_IOVA_AS_PA is 0).
+	 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
 	 * Force alignment to 8-bytes, so as to ensure we have the exact
 	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
 	 * working on vector drivers easier.
@@ -599,7 +599,7 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
@@ -608,7 +608,7 @@ struct rte_mbuf {
 #else
 	/**
 	 * Reserved for dynamic fields
-	 * when the next pointer is in first cache line (i.e. RTE_IOVA_AS_PA is 0).
+	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
 	uint64_t dynfield2;
 #endif
diff --git a/lib/mbuf/rte_mbuf_dyn.c b/lib/mbuf/rte_mbuf_dyn.c
index 35839e938c..5049508bea 100644
--- a/lib/mbuf/rte_mbuf_dyn.c
+++ b/lib/mbuf/rte_mbuf_dyn.c
@@ -128,7 +128,7 @@ init_shared_mem(void)
 		 */
 		memset(shm, 0, sizeof(*shm));
 		mark_free(dynfield1);
-#if !RTE_IOVA_AS_PA
+#if !RTE_IOVA_IN_MBUF
 		mark_free(dynfield2);
 #endif
 
diff --git a/lib/meson.build b/lib/meson.build
index 2bc0932ad5..fc7abd4aa3 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -93,7 +93,7 @@ dpdk_libs_deprecated += [
 disabled_libs = []
 opt_disabled_libs = run_command(list_dir_globs, get_option('disable_libs'),
         check: true).stdout().split()
-if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
+if not get_option('enable_iova_as_pa')
     opt_disabled_libs += ['kni']
 endif
 foreach l:opt_disabled_libs
diff --git a/meson_options.txt b/meson_options.txt
index 08528492f7..82c8297065 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -41,7 +41,7 @@ option('max_lcores', type: 'string', value: 'default', description:
 option('max_numa_nodes', type: 'string', value: 'default', description:
        'Set the highest NUMA node supported by EAL; "default" is different per-arch, "detect" detects the highest NUMA node on the build machine.')
 option('enable_iova_as_pa', type: 'boolean', value: true, description:
-       'Support for IOVA as physical address. Disabling removes the buf_iova field of mbuf.')
+       'Support the use of physical addresses for IO addresses, such as used by UIO or VFIO in no-IOMMU mode. When disabled, DPDK can only run with IOMMU support for address mappings, but will have more space available in the mbuf structure.')
 option('mbuf_refcnt_atomic', type: 'boolean', value: true, description:
        'Atomically access the mbuf refcnt.')
 option('platform', type: 'string', value: 'native', description:
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02  8:38  0%         ` Yan, Zhirun
@ 2023-03-02 13:58  0%           ` Jerin Jacob
  2023-03-07  8:26  0%             ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-03-02 13:58 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 27, 2023 6:23 AM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > > Wang, Haiyue <haiyue.wang@intel.com>
> > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > > APIs
> > > >
> > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> > wrote:
> > > > >
> > > > > Add new get/set APIs to configure graph worker model which is used
> > > > > to determine which model will be chosen.
> > > > >
> > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > ---
> > > > >  lib/graph/rte_graph_worker.h        | 51
> > +++++++++++++++++++++++++++++
> > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > >  lib/graph/version.map               |  3 ++
> > > > >  3 files changed, 67 insertions(+)
> > > > >
> > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > 100644
> > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > @@ -1,5 +1,56 @@
> > > > >  #include "rte_graph_model_rtc.h"
> > > > >
> > > > > +static enum rte_graph_worker_model worker_model =
> > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > >
> > > > This will break the multiprocess.
> > >
> > > Thanks. I will use TLS for per-thread local storage.
> >
> > If it needs to be used from secondary process, then it needs to be from
> > memzone.
> >
>
>
> This filed will be set by primary process in initial stage, and then lcore will only read it.
> I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
> not necessary to allocate from memzone.
>
> >
> >
> > >
> > > >
> > > > > +
> > > > > +/** Graph worker models */
> > > > > +enum rte_graph_worker_model {
> > > > > +#define WORKER_MODEL_DEFAULT "default"
> > > >
> > > > Why need strings?
> > > > Also, every symbol in a public header file should start with RTE_ to
> > > > avoid namespace conflict.
> > >
> > > It was used to config the model in app. I can put the string into example.
> >
> > OK
> >
> > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > > +"rtc"
> > > > > +       RTE_GRAPH_MODEL_RTC,
> > > >
> > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > enum
> > > > itself.
> > > Yes, will do in next version.
> > >
> > > >
> > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > >
> > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > RTE_GRAPH_MODEL_PIPELINE
> > >
> > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> >
> > Hybrid is very overloaded term, and it will be confusing (considering there
> > will be new models in future).
> > Please pick a word that really express the model working.
> >
>
> In this case, the path is Node0 -> Node1 -> Node2 -> Node3
> And Node1 and Node3 are binding with one core.
>
> Our model offers the ability to dispatch between cores.
>
> Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

Some names, What I can think of

// MCORE->MULTI CORE

RTE_GRAPH_MODEL_MCORE_PIPELINE
or
RTE_GRAG_MODEL_MCORE_DISPATCH
or
RTE_GRAG_MODEL_MCORE_RING
or
RTE_GRAPH_MODEL_MULTI_CORE

>
> + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> '            '     '                          '     '            '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> '            '     '     |                    '     '      ^     '
> + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                          |                                 |
>                          + - - - - - - - - - - - - - - - - +
>
>
> > > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > +       RTE_GRAPH_MODEL_MAX,
> > > >
> > > > No need for MAX, it will break the ABI for future. See other
> > > > subsystem such as cryptodev.
> > >
> > > Thanks, I will change it.
> > > >
> > > > > +};
> > > >
> > > > >

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-26 22:23  0%       ` Jerin Jacob
@ 2023-03-02  8:38  0%         ` Yan, Zhirun
  2023-03-02 13:58  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Yan, Zhirun @ 2023-03-02  8:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 27, 2023 6:23 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 20, 2023 9:51 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> wrote:
> > > >
> > > > Add new get/set APIs to configure graph worker model which is used
> > > > to determine which model will be chosen.
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  lib/graph/rte_graph_worker.h        | 51
> +++++++++++++++++++++++++++++
> > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > >  lib/graph/version.map               |  3 ++
> > > >  3 files changed, 67 insertions(+)
> > > >
> > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -1,5 +1,56 @@
> > > >  #include "rte_graph_model_rtc.h"
> > > >
> > > > +static enum rte_graph_worker_model worker_model =
> > > > +RTE_GRAPH_MODEL_DEFAULT;
> > >
> > > This will break the multiprocess.
> >
> > Thanks. I will use TLS for per-thread local storage.
> 
> If it needs to be used from secondary process, then it needs to be from
> memzone.
> 


This filed will be set by primary process in initial stage, and then lcore will only read it.
I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
not necessary to allocate from memzone.

> 
> 
> >
> > >
> > > > +
> > > > +/** Graph worker models */
> > > > +enum rte_graph_worker_model {
> > > > +#define WORKER_MODEL_DEFAULT "default"
> > >
> > > Why need strings?
> > > Also, every symbol in a public header file should start with RTE_ to
> > > avoid namespace conflict.
> >
> > It was used to config the model in app. I can put the string into example.
> 
> OK
> 
> >
> > >
> > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > +"rtc"
> > > > +       RTE_GRAPH_MODEL_RTC,
> > >
> > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> enum
> > > itself.
> > Yes, will do in next version.
> >
> > >
> > > > +#define WORKER_MODEL_GENERIC "generic"
> > >
> > > Generic is a very overloaded term. Use pipeline here i.e
> > > RTE_GRAPH_MODEL_PIPELINE
> >
> > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> 
> Hybrid is very overloaded term, and it will be confusing (considering there
> will be new models in future).
> Please pick a word that really express the model working.
> 

In this case, the path is Node0 -> Node1 -> Node2 -> Node3
And Node1 and Node3 are binding with one core.

Our model offers the ability to dispatch between cores.

Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

+ - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
'            '     '                          '     '            '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
'            '     '     |                    '     '      ^     '
+ - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                         |                                 |
                         + - - - - - - - - - - - - - - - - +


> > >
> > >
> > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > +       RTE_GRAPH_MODEL_MAX,
> > >
> > > No need for MAX, it will break the ABI for future. See other
> > > subsystem such as cryptodev.
> >
> > Thanks, I will change it.
> > >
> > > > +};
> > >
> > > >

^ permalink raw reply	[relevance 0%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-03-01 15:50  3%       ` Mattias Rönnblom
@ 2023-03-01 17:06  0%         ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2023-03-01 17:06 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 1 March 2023 16.50
> 
> On 2023-03-01 14:31, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Wednesday, 1 March 2023 12.18
> >>
> >> On 2023-02-28 17:01, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>> Sent: Tuesday, 28 February 2023 10.39
> >>>
> >>> I have been looking for a high performance timer library (for use in
> a fast
> >> path TCP stack), and this looks very useful, Mattias.
> >>>
> >>> My initial feedback is based on quickly skimming the patch source
> code, and
> >> reading this cover letter.
> >>>
> >>>>
> >>>> This patchset is an attempt to introduce a high-performance, highly
> >>>> scalable timer facility into DPDK.
> >>>>
> >>>> More specifically, the goals for the htimer library are:
> >>>>
> >>>> * Efficient handling of a handful up to hundreds of thousands of
> >>>>     concurrent timers.
> >>>> * Reduced overhead of adding and canceling timers.
> >>>> * Provide a service functionally equivalent to that of
> >>>>     <rte_timer.h>. API/ABI backward compatibility is secondary.
> >>>>
> >>>> In the author's opinion, there are two main shortcomings with the
> >>>> current DPDK timer library (i.e., rte_timer.[ch]).
> >>>>
> >>>> One is the synchronization overhead, where heavy-weight full-
> barrier
> >>>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> >>>> lists, but any thread may add or cancel (or otherwise access)
> timers
> >>>> managed by another lcore (and thus resides in its timer skip list).
> >>>>
> >>>> The other is an algorithmic shortcoming, with rte_timer.c's
> reliance
> >>>> on a skip list, which, seemingly, is less efficient than certain
> >>>> alternatives.
> >>>>
> >>>> This patchset implements a hierarchical timer wheel (HWT, in
> >>>
> >>> Typo: HWT or HTW?
> >>
> >> Yes. I don't understand how I could managed to make so many such HTW
> ->
> >> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
> >>
> >>>
> >>>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> >>>> Hierarchical Timing Wheels: Data Structures for the Efficient
> >>>> Implementation of a Timer Facility". A HWT is a data structure
> >>>> purposely design for this task, and used by many operating system
> >>>> kernel timer facilities.
> >>>>
> >>>> To further improve the solution described by Varghese and Lauck, a
> >>>> bitset is placed in front of each of the timer wheel in the HWT,
> >>>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing
> time
> >>>> and expiry processing).
> >>>>
> >>>> Cycle-efficient scanning and manipulation of these bitsets are
> crucial
> >>>> for the HWT's performance.
> >>>>
> >>>> The htimer module keeps a per-lcore (or per-registered EAL thread)
> HWT
> >>>> instance, much like rte_timer.c keeps a per-lcore skip list.
> >>>>
> >>>> To avoid expensive synchronization overhead for thread-local timer
> >>>> management, the HWTs are accessed only from the "owning" thread.
> Any
> >>>> interaction any other thread has with a particular lcore's timer
> >>>> wheel goes over a set of DPDK rings. A side-effect of this design
> is
> >>>> that all operations working toward a "remote" HWT must be
> >>>> asynchronous.
> >>>>
> >>>> The <rte_htimer.h> API is available only to EAL threads and
> registered
> >>>> non-EAL threads.
> >>>>
> >>>> The htimer API allows the application to supply the current time,
> >>>> useful in case it already has retrieved this for other purposes,
> >>>> saving the cost of a rdtsc instruction (or its equivalent).
> >>>>
> >>>> Relative htimer does not retrieve a new time, but reuse the current
> >>>> time (as known via/at-the-time of the manage-call), again to shave
> off
> >>>> some cycles of overhead.
> >>>
> >>> I have a comment to the two points above.
> >>>
> >>> I agree that the application should supply the current time.
> >>>
> >>> This should be the concept throughout the library. I don't
> understand why
> >> TSC is used in the library at all?
> >>>
> >>> Please use a unit-less tick, and let the application decide what one
> tick
> >> means.
> >>>
> >>
> >> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
> >> sense if you think of the user of the API as not just a "monolithic"
> >> application, but rather a set of different modules, developed by
> >> different organizations, and reused across a set of applications. The
> >> idea behind the API design is they should all be able to share one
> timer
> >> service instance.
> >>
> >> The different parts of the application and any future DPDK platform
> >> modules that use the htimer service needs to agree what a tick means
> in
> >> terms of actual wall-time, if it's not mandated by the API.
> >
> > I see. Then those non-monolithic applications can agree that the unit
> of time is nanoseconds, or whatever makes sense for those applications.
> And then they can instantiate one shared HTW for that purpose.
> >
> 
> <rte_htimer_mgr.h> contains nothing but shared HTWs.
> 
> > There is no need to impose such an API limit on other users of the
> library.
> >
> >>
> >> There might be room for module-specific timer wheels as well, with
> >> different resolution or other characteristics. The event timer
> adapter's
> >> use of a timer wheel could be one example (although I'm not sure it
> is).
> >
> > We are not using the event device, and I have not looked into it, so I
> have no qualified comments to this.
> >
> >>
> >> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
> >> one could consider make the <rte_htw.h> API public as well. That is
> what
> >> I think you as asking for here: a generic timer wheel that doesn't
> know
> >> anything about time sources, time source time -> tick conversion, or
> >> timer source time -> monotonic wall time conversion, and maybe is
> also
> >> not bound to a particular thread.
> >
> > Yes, that is what I had been searching the Internet for.
> >
> > (I'm not sure what you mean by "not bound to a particular thread".
> Your per-thread design seems good to me.)
> >
> > I don't want more stuff in the EAL. What I want is high-performance
> DPDK libraries we can use in our applications.
> >
> >>
> >> I picked TSC because it seemed like a good "universal time unit" for
> >> DPDK. rdtsc (and its equivalent) is also a very precise (especially
> on
> >> x86) and cheap-to-retrieve (especially on ARM, from what I
> understand).
> >
> > The TSC does have excellent performance, but on all other parameters
> it is a horrible time keeper: The measurement unit depends on the
> underlying hardware, the TSC drifts depending on temperature, it cannot
> be PTP synchronized, the list is endless!
> >
> >>
> >> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
> >> format) should be the default for timer expiration time instead of
> TSC.
> >> TSC could still be an option for passing the current time, since TSC
> >> will be a common time source, and it shaves off one conversion.
> >
> > There are many reasons why nanoseconds is a much better choice than
> TSC.
> >
> >>
> >>> A unit-less tick will also let the application instantiate a HTW
> with higher
> >> resolution than the TSC. (E.g. think about oversampling in audio
> processing,
> >> or Brezenham's line drawing algorithm for 2D visuals - oversampling
> can sound
> >> and look better.)
> >
> > Some of the timing data in our application have a resolution orders of
> magnitude higher than one nanosecond. If we combined that with a HTW
> library with nanosecond resolution, we would need to keep these timer
> values in two locations: The original high-res timer in our data
> structure, and the shadow low-res (nanosecond) timer in the HTW.
> >
> 
> There is no way you will meet timers with anything approaching
> pico-second-level precision.

Correct. Our sub-nanosecond timers don't need to meet the exact time, but the higher resolution prevents loss of accuracy when a number has been added to it many times. Think of it like a special fixed-point number, where the least significant part is included to ensure accuracy in calculations, while the actual timer only considers the most significant part of the number.

> You will also get into a value range issue,
> since you will wrap around a 64-bit integer in a matter of days.

Yes. We use timers with different scales for individual purposes. Our highest resolution are sub-nanosecond.

> 
> The HTW only stores the timeout in ticks, not TSC, nanoseconds or
> picoseconds.

Excellent. Then I'm happy.

> Generally, you don't want pico-second-level tick
> granularity, since it increases the overhead of advancing the wheel(s).

We currently use proprietary algorithms for our bandwidth scheduling. It seems that a HTW is not a good fit for this purpose. Perhaps you are offering a hammer, and it's not a good replacement for my screwdriver.

I suppose that nanosecond resolution suffices for a TCP stack, which is the use case I have been on the lookout for a timer library for. :-)

> The first (lowest-significance) few wheels will pretty much always be
> empty.
> 
> > We might also need to frequently update the HTW timers to prevent
> drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when
> rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6)
> rounded. This level of drifting would also make periodic timers in the
> HTW useless.
> >
> 
> Useless, for a certain class of applications. What application would
> that be?

Sorry about being unclear there. Yes, I only meant the specific application I was talking about, i.e. our application for high precision bandwidth management. For reference, 1 bit at 100 Gbit/s is 10 picoseconds.

> 
> > Please note: I haven't really considered merging the high-res timing
> in our application with this HTW, and I'm also not saying that PERIODIC
> timers in the HTW are required or even useful for our application. I'm
> only providing arguments for a unit-less time!
> >
> >>>
> >>> For reference (supporting my suggestion), the dynamic timestamp
> field in the
> >> rte_mbuf structure is also defined as being unit-less. (I think
> NVIDIA
> >> implements it as nanoseconds, but that's an implementation specific
> choice.)
> >>>
> >>>>
> >>>> A semantic improvement compared to the <rte_timer.h> API is that
> the
> >>>> htimer library can give a definite answer on the question if the
> timer
> >>>> expiry callback was called, after a timer has been canceled.
> >>>>
> >>>> Below is a performance data from DPDK's 'app/test' micro
> benchmarks,
> >>>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> >>>> test_htimer_mgr_perf.c) aren't identical in their structure, but
> the
> >>>> numbers give some indication of the difference.
> >>>>
> >>>> Use case               htimer  timer
> >>>> ------------------------------------
> >>>> Add timer                 28    253
> >>>> Cancel timer              10    412
> >>>> Async add (source lcore)  64
> >>>> Async add (target lcore)  13
> >>>>
> >>>> (AMD 5900X CPU. Time in TSC.)
> >>>>
> >>>> Prototype integration of the htimer library into real, timer-heavy,
> >>>> applications indicates that htimer may result in significant
> >>>> application-level performance gains.
> >>>>
> >>>> The bitset implementation which the HWT implementation depends upon
> >>>> seemed generic-enough and potentially useful outside the world of
> >>>> HWTs, to justify being located in the EAL.
> >>>>
> >>>> This patchset is very much an RFC, and the author is yet to form an
> >>>> opinion on many important issues.
> >>>>
> >>>> * If deemed a suitable replacement, should the htimer replace the
> >>>>     current DPDK timer library in some particular (ABI-breaking)
> >>>>     release, or should it live side-by-side with the then-legacy
> >>>>     <rte_timer.h> API? A lot of things in and outside DPDK depend
> on
> >>>>     <rte_timer.h>, so coexistence may be required to facilitate a
> smooth
> >>>>     transition.
> >>>
> >>> It's my immediate impression that they are totally different in both
> design
> >> philosophy and API.
> >>>
> >>> Personal opinion: I would call it an entirely different library.
> >>>
> >>>>
> >>>> * Should the htimer and htw-related files be colocated with
> rte_timer.c
> >>>>     in the timer library?
> >>>
> >>> Personal opinion: No. This is an entirely different library, and
> should live
> >> for itself in a directory of its own.
> >>>
> >>>>
> >>>> * Would it be useful for applications using asynchronous cancel to
> >>>>     have the option of having the timer callback run not only in
> case of
> >>>>     timer expiration, but also cancellation (on the target lcore)?
> The
> >>>>     timer cb signature would need to include an additional
> parameter in
> >>>>     that case.
> >>>
> >>> If one thread cancels something in another thread, some
> synchronization
> >> between the threads is going to be required anyway. So we could
> reprase your
> >> question: Will the burden of the otherwise required synchronization
> between
> >> the two threads be significantly reduced if the library has the
> ability to run
> >> the callback on asynchronous cancel?
> >>>
> >>
> >> Yes.
> >>
> >> Intuitively, it seems convenient that if you hand off a timer to a
> >> different lcore, the timer callback will be called exactly once,
> >> regardless if the timer was canceled or expired.
> >>
> >> But, as you indicate, you may still need synchronization to solve the
> >> resource reclamation issue.
> >>
> >>> Is such a feature mostly "Must have" or "Nice to have"?
> >>>
> >>> More thoughts in this area...
> >>>
> >>> If adding and additional callback parameter, it could be an enum, so
> the
> >> callback could be expanded to support "timeout (a.k.a. timer fired)",
> "cancel"
> >> and more events we have not yet come up with, e.g. "early kick".
> >>>
> >>
> >> Yes, or an int.
> >>
> >>> Here's an idea off the top of my head: An additional callback
> parameter has
> >> a (small) performance cost incurred with every timer fired (which is
> a very
> >> large multiplier). It might not be required. As an alternative to an
> "what
> >> happened" parameter to the callback, the callback could investigate
> the state
> >> of the object for which the timer fired, and draw its own conclusion
> on how to
> >> proceed. Obviously, this also has a performance cost, but perhaps the
> callback
> >> works on the object's state anyway, making this cost insignificant.
> >>>
> >>
> >> It's not obvious to me that you, in the timer callback, can determine
> >> what happened, if the same callback is called both in the cancel and
> the
> >> expired case.
> >>
> >> The cost of an extra integer passed in a register (or checking a
> flag,
> >> if the timer callback should be called at all at cancellation) that
> is
> >> the concern for me; it's extra bit of API complexity.
> >
> > Then introduce the library without this feature. More features can be
> added later.
> >
> > The library will be introduced as "experimental", so we are free to
> improve it and modify the ABI along the way.
> >
> >>
> >>> Here's another alternative to adding a "what happened" parameter to
> the
> >> callback:
> >>>
> >>> The rte_htimer could have one more callback pointer, which (if set)
> will be
> >> called on cancellation of the timer.
> >>>
> >>
> >> This will grow the timer struct with 16 bytes.
> >
> > If the rte_htimer struct stays within one cache line, it should be
> acceptable.
> >
> 
> Timer structs are often embedded in other structures, and need not
> themselves be cache line aligned (although the "parent" struct may need
> to be, e.g. if it's dynamically allocated).
> 
> So smaller is better. Just consider if you want your attosecond-level
> time stamp in a struct:
> 
> struct my_timer {
>      uint64_t high_precision_time_high_bits;
>      uint64_t high_precision_time_low_bits;
>      struct rte_htimer timer;
> };
> 
> ...and you allocate those structs from a mempool. If rte_htimer is small
> enough, you will fit on one cache line.

Ahh... I somehow assumed they only existed as stand-alone elements inside the HTW.

Then I obviously agree that shorter is better.

> 
> > On the other hand, this approach is less generic than passing an
> additional parameter. (E.g. add yet another callback pointer for "early
> kick"?)
> >
> > BTW, async cancel is a form of inter-thread communication. Does this
> library really need to provide any inter-thread communication
> mechanisms? Doesn't an inter-thread communication mechanism belong in a
> separate library?
> >
> 
> Yes, <rte_htimer_mgr.h> needs this because:
> 1) Being able to schedule timers on a remote lcore is a useful feature
> (especially since we don't have much else in terms of deferred work
> mechanisms in DPDK).

Although remote procedures is a useful feature, providing such a feature doesn't necessarily belong in a library that uses remote procedures.

> 2) htimer aspires to be a plug-in replacement for <rte_timer.h> (albeit
> an ABI-breaking one).

This is a good argument.

But I would much rather have a highly tuned stand-alone HTW library than a plug-in replacement of the old <rte_timer.h>.

> 
> The pure HTW is in rte_htw.[ch].
> 
> Plus, with the current design, async operations basically come for free
> (if you don't use them), from a performance perspective. The extra
> overhead boils down to occasionally polling an empty ring, which is an
> inexpensive operation.

OK. Then no worries.

> 
> >>
> >>>>
> >>>> * Should the rte_htimer be a nested struct, so the htw parts be
> separated
> >>>>     from the htimer parts?
> >>>>
> >>>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
> >>>>     <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should
> it
> >>>>     be so?
> >>>>
> >>>> * rte_htimer struct is only supposed to be used by the application
> to
> >>>>     give an indication of how much memory it needs to allocate, and
> is
> >>>>     its member are not supposed to be directly accessed (w/ the
> possible
> >>>>     exception of the owner_lcore_id field). Should there be a dummy
> >>>>     struct, or a #define RTE_HTIMER_MEMSIZE or a
> rte_htimer_get_memsize()
> >>>>     function instead, serving the same purpose? Better
> encapsulation,
> >>>>     but more inconvenient for applications. Run-time dynamic sizing
> >>>>     would force application-level dynamic allocations.
> >>>>
> >>>> * Asynchronous cancellation is a little tricky to use for the
> >>>>     application (primarily due to timer memory reclamation/race
> >>>>     issues). Should this functionality be removed?
> >>>>
> >>>> * Should rte_htimer_mgr_init() also retrieve the current time? If
> so,
> >>>>     there should to be a variant which allows the user to specify
> the
> >>>>     time (to match rte_htimer_mgr_manage_time()). One pitfall with
> the
> >>>>     current proposed API is an application calling
> rte_htimer_mgr_init()
> >>>>     and then immediately adding a timer with a relative timeout, in
> >>>>     which case the current absolute time used is 0, which might be
> a
> >>>>     surprise.
> >>>>
> >>>> * Should libdivide (optionally) be used to avoid the div in the TSC
> ->
> >>>>     tick conversion? (Doesn't improve performance on Zen 3, but may
> >>>>     do on other CPUs.) Consider <rte_reciprocal.h> as well.
> >>>>
> >>>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts
> can be
> >>>>     used for conversion? Very minor performance gains to be found
> there,
> >>>>     at least on Zen 3 cores.
> >>>>
> >>>> * Should it be possible to supply the time in rte_htimer_mgr_add()
> >>>>     and/or rte_htimer_mgr_manage_time() functions as ticks, rather
> than
> >>>>     as TSC? Should it be possible to also use nanoseconds?
> >>>>     rte_htimer_mgr_manage_time() would need a flags parameter in
> that
> >>>>     case.
> >>>
> >>> Do not use TSC anywhere in this library. Let the application decide
> the
> >> meaning of a tick.
> >>>
> >>>>
> >>>> * Would the event timer adapter be best off using <rte_htw.h>
> >>>>     directly, or <rte_htimer.h>? In the latter case, there needs to
> be a
> >>>>     way to instantiate more HWTs (similar to the "alt" functions of
> >>>>     <rte_timer.h>)?
> >>>>
> >>>> * Should the PERIODICAL flag (and the complexity it brings) be
> >>>>     removed? And leave the application with only single-shot
> timers, and
> >>>>     the option to re-add them in the timer callback.
> >>>
> >>> First thought: Yes, keep it lean and remove the periodical stuff.
> >>>
> >>> Second thought: This needs a more detailed analysis.
> >>>
> >>>   From one angle:
> >>>
> >>> How many PERIODICAL versus ONESHOT timers do we expect?
> >>>
> >>
> >> I suspect you should be prepared for the ratio being anything.
> >
> > In theory, anything is possible. But I'm asking that we consider
> realistic use cases.
> >
> >>
> >>> Intuitively, I would use this library for ONESHOT timers, and
> perhaps
> >> implement my periodical timers by other means.
> >>>
> >>> If the PERIODICAL:ONESHOT ratio is low, we can probably live with
> the extra
> >> cost of cancel+add for a few periodical timers.
> >>>
> >>>   From another angle:
> >>>
> >>> What is the performance gain with the PERIODICAL flag?
> >>>
> >>
> >> None, pretty much. It's just there for convenience.
> >
> > OK, then I suggest that you remove it, unless you get objections.
> >
> > The library can be expanded with useful features at any time later.
> Useless features are (nearly) impossible to remove, once they are in
> there - they are just "technical debt" with associated maintenance
> costs, added complexity weaving into other features, etc..
> >
> >>
> >>> Without a periodical timer, cancel+add costs 10+28 cycles. How many
> cycles
> >> would a "move" function, performing both cancel and add, use?
> >>>
> >>> And then compare that to the cost (in cycles) of repeating a timer
> with
> >> PERIODICAL?
> >>>
> >>> Furthermore, not having the PERIODICAL flag probably improves the
> >> performance for non-periodical timers. How many cycles could we gain
> here?
> >>>
> >>>
> >>> Another, vaguely related, idea:
> >>>
> >>> The callback pointer might not need to be stored per rte_htimer, but
> could
> >> instead be common for the rte_htw.
> >>>
> >>
> >> Do you mean rte_htw, or rte_htimer_mgr?
> >>
> >> If you make one common callback, all the different parts of the
> >> application needs to be coordinated (in a big switch-statement, or
> >> something of that sort), or have some convention for using an
> >> application-specific wrapper structure (accessed via container_of()).
> >>
> >> This is a problem if the timer service API consumer is a set of
> largely
> >> uncoordinated software modules.
> >>
> >> Btw, the eventdev API has the same issue, and the proposed event
> >> dispatcher is one way to help facilitate application-internal
> decoupling.
> >>
> >> For a module-private rte_htw instance your suggestion may work, but
> not
> >> for <rte_htimer_mgr.h>.
> >
> > I was speculating that a common callback pointer might provide a
> performance benefit for single-purpose HTW instances. (The same concept
> applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer
> Cancelled", and an "Early Kick" callback pointer - i.e. having the
> callback pointers per HTW instance, instead of per timer.)
> >
> >>
> >>> When a timer fires, the callback probably needs to check/update the
> state of
> >> the object for which the timer fired anyway, so why not just let the
> >> application use that state to determine the appropriate action. This
> might
> >> provide some performance benefit.
> >>>
> >>> It might complicate using one HTW for multiple different purposes,
> though.
> >> Probably a useless idea, but I wanted to share the idea anyway. It
> might
> >> trigger other, better ideas in the community.
> >>>
> >>>>
> >>>> * Should the async result codes and the sync cancel error codes be
> merged
> >>>>     into one set of result codes?
> >>>>
> >>>> * Should the rte_htimer_mgr_async_add() have a flag which allow
> >>>>     buffering add request messages until rte_htimer_mgr_process()
> is
> >>>>     called? Or any manage function. Would reduce ring signaling
> overhead
> >>>>     (i.e., burst enqueue operations instead of single-element
> >>>>     enqueue). Could also be a rte_htimer_mgr_async_add_burst()
> function,
> >>>>     solving the same "problem" a different way. (The signature of
> such
> >>>>     a function would not be pretty.)
> >>>>
> >>>> * Does the functionality provided by the rte_htimer_mgr_process()
> >>>>     function match its the use cases? Should there me a more clear
> >>>>     separation between expiry processing and asynchronous operation
> >>>>     processing?
> >>>>
> >>>> * Should the patchset be split into more commits? If so, how?
> >>>>
> >>>> Thanks to Erik Carrillo for his assistance.
> >>>>
> >>>> Mattias Rönnblom (2):
> >>>>     eal: add bitset type
> >>>>     eal: add high-performance timer facility
> >


^ permalink raw reply	[relevance 0%]

* Re: [RFC 0/2] Add high-performance timer facility
  2023-03-01 13:31  3%     ` Morten Brørup
@ 2023-03-01 15:50  3%       ` Mattias Rönnblom
  2023-03-01 17:06  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Mattias Rönnblom @ 2023-03-01 15:50 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

On 2023-03-01 14:31, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Wednesday, 1 March 2023 12.18
>>
>> On 2023-02-28 17:01, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>> Sent: Tuesday, 28 February 2023 10.39
>>>
>>> I have been looking for a high performance timer library (for use in a fast
>> path TCP stack), and this looks very useful, Mattias.
>>>
>>> My initial feedback is based on quickly skimming the patch source code, and
>> reading this cover letter.
>>>
>>>>
>>>> This patchset is an attempt to introduce a high-performance, highly
>>>> scalable timer facility into DPDK.
>>>>
>>>> More specifically, the goals for the htimer library are:
>>>>
>>>> * Efficient handling of a handful up to hundreds of thousands of
>>>>     concurrent timers.
>>>> * Reduced overhead of adding and canceling timers.
>>>> * Provide a service functionally equivalent to that of
>>>>     <rte_timer.h>. API/ABI backward compatibility is secondary.
>>>>
>>>> In the author's opinion, there are two main shortcomings with the
>>>> current DPDK timer library (i.e., rte_timer.[ch]).
>>>>
>>>> One is the synchronization overhead, where heavy-weight full-barrier
>>>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
>>>> lists, but any thread may add or cancel (or otherwise access) timers
>>>> managed by another lcore (and thus resides in its timer skip list).
>>>>
>>>> The other is an algorithmic shortcoming, with rte_timer.c's reliance
>>>> on a skip list, which, seemingly, is less efficient than certain
>>>> alternatives.
>>>>
>>>> This patchset implements a hierarchical timer wheel (HWT, in
>>>
>>> Typo: HWT or HTW?
>>
>> Yes. I don't understand how I could managed to make so many such HTW ->
>> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
>>
>>>
>>>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
>>>> Hierarchical Timing Wheels: Data Structures for the Efficient
>>>> Implementation of a Timer Facility". A HWT is a data structure
>>>> purposely design for this task, and used by many operating system
>>>> kernel timer facilities.
>>>>
>>>> To further improve the solution described by Varghese and Lauck, a
>>>> bitset is placed in front of each of the timer wheel in the HWT,
>>>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
>>>> and expiry processing).
>>>>
>>>> Cycle-efficient scanning and manipulation of these bitsets are crucial
>>>> for the HWT's performance.
>>>>
>>>> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
>>>> instance, much like rte_timer.c keeps a per-lcore skip list.
>>>>
>>>> To avoid expensive synchronization overhead for thread-local timer
>>>> management, the HWTs are accessed only from the "owning" thread.  Any
>>>> interaction any other thread has with a particular lcore's timer
>>>> wheel goes over a set of DPDK rings. A side-effect of this design is
>>>> that all operations working toward a "remote" HWT must be
>>>> asynchronous.
>>>>
>>>> The <rte_htimer.h> API is available only to EAL threads and registered
>>>> non-EAL threads.
>>>>
>>>> The htimer API allows the application to supply the current time,
>>>> useful in case it already has retrieved this for other purposes,
>>>> saving the cost of a rdtsc instruction (or its equivalent).
>>>>
>>>> Relative htimer does not retrieve a new time, but reuse the current
>>>> time (as known via/at-the-time of the manage-call), again to shave off
>>>> some cycles of overhead.
>>>
>>> I have a comment to the two points above.
>>>
>>> I agree that the application should supply the current time.
>>>
>>> This should be the concept throughout the library. I don't understand why
>> TSC is used in the library at all?
>>>
>>> Please use a unit-less tick, and let the application decide what one tick
>> means.
>>>
>>
>> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
>> sense if you think of the user of the API as not just a "monolithic"
>> application, but rather a set of different modules, developed by
>> different organizations, and reused across a set of applications. The
>> idea behind the API design is they should all be able to share one timer
>> service instance.
>>
>> The different parts of the application and any future DPDK platform
>> modules that use the htimer service needs to agree what a tick means in
>> terms of actual wall-time, if it's not mandated by the API.
> 
> I see. Then those non-monolithic applications can agree that the unit of time is nanoseconds, or whatever makes sense for those applications. And then they can instantiate one shared HTW for that purpose.
> 

<rte_htimer_mgr.h> contains nothing but shared HTWs.

> There is no need to impose such an API limit on other users of the library.
> 
>>
>> There might be room for module-specific timer wheels as well, with
>> different resolution or other characteristics. The event timer adapter's
>> use of a timer wheel could be one example (although I'm not sure it is).
> 
> We are not using the event device, and I have not looked into it, so I have no qualified comments to this.
> 
>>
>> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
>> one could consider make the <rte_htw.h> API public as well. That is what
>> I think you as asking for here: a generic timer wheel that doesn't know
>> anything about time sources, time source time -> tick conversion, or
>> timer source time -> monotonic wall time conversion, and maybe is also
>> not bound to a particular thread.
> 
> Yes, that is what I had been searching the Internet for.
> 
> (I'm not sure what you mean by "not bound to a particular thread". Your per-thread design seems good to me.)
> 
> I don't want more stuff in the EAL. What I want is high-performance DPDK libraries we can use in our applications.
> 
>>
>> I picked TSC because it seemed like a good "universal time unit" for
>> DPDK. rdtsc (and its equivalent) is also a very precise (especially on
>> x86) and cheap-to-retrieve (especially on ARM, from what I understand).
> 
> The TSC does have excellent performance, but on all other parameters it is a horrible time keeper: The measurement unit depends on the underlying hardware, the TSC drifts depending on temperature, it cannot be PTP synchronized, the list is endless!
> 
>>
>> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
>> format) should be the default for timer expiration time instead of TSC.
>> TSC could still be an option for passing the current time, since TSC
>> will be a common time source, and it shaves off one conversion.
> 
> There are many reasons why nanoseconds is a much better choice than TSC.
> 
>>
>>> A unit-less tick will also let the application instantiate a HTW with higher
>> resolution than the TSC. (E.g. think about oversampling in audio processing,
>> or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound
>> and look better.)
> 
> Some of the timing data in our application have a resolution orders of magnitude higher than one nanosecond. If we combined that with a HTW library with nanosecond resolution, we would need to keep these timer values in two locations: The original high-res timer in our data structure, and the shadow low-res (nanosecond) timer in the HTW.
> 

There is no way you will meet timers with anything approaching 
pico-second-level precision. You will also get into a value range issue, 
since you will wrap around a 64-bit integer in a matter of days.

The HTW only stores the timeout in ticks, not TSC, nanoseconds or 
picoseconds. Generally, you don't want pico-second-level tick 
granularity, since it increases the overhead of advancing the wheel(s). 
The first (lowest-significance) few wheels will pretty much always be empty.

> We might also need to frequently update the HTW timers to prevent drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6) rounded. This level of drifting would also make periodic timers in the HTW useless.
> 

Useless, for a certain class of applications. What application would 
that be?

> Please note: I haven't really considered merging the high-res timing in our application with this HTW, and I'm also not saying that PERIODIC timers in the HTW are required or even useful for our application. I'm only providing arguments for a unit-less time!
> 
>>>
>>> For reference (supporting my suggestion), the dynamic timestamp field in the
>> rte_mbuf structure is also defined as being unit-less. (I think NVIDIA
>> implements it as nanoseconds, but that's an implementation specific choice.)
>>>
>>>>
>>>> A semantic improvement compared to the <rte_timer.h> API is that the
>>>> htimer library can give a definite answer on the question if the timer
>>>> expiry callback was called, after a timer has been canceled.
>>>>
>>>> Below is a performance data from DPDK's 'app/test' micro benchmarks,
>>>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
>>>> test_htimer_mgr_perf.c) aren't identical in their structure, but the
>>>> numbers give some indication of the difference.
>>>>
>>>> Use case               htimer  timer
>>>> ------------------------------------
>>>> Add timer                 28    253
>>>> Cancel timer              10    412
>>>> Async add (source lcore)  64
>>>> Async add (target lcore)  13
>>>>
>>>> (AMD 5900X CPU. Time in TSC.)
>>>>
>>>> Prototype integration of the htimer library into real, timer-heavy,
>>>> applications indicates that htimer may result in significant
>>>> application-level performance gains.
>>>>
>>>> The bitset implementation which the HWT implementation depends upon
>>>> seemed generic-enough and potentially useful outside the world of
>>>> HWTs, to justify being located in the EAL.
>>>>
>>>> This patchset is very much an RFC, and the author is yet to form an
>>>> opinion on many important issues.
>>>>
>>>> * If deemed a suitable replacement, should the htimer replace the
>>>>     current DPDK timer library in some particular (ABI-breaking)
>>>>     release, or should it live side-by-side with the then-legacy
>>>>     <rte_timer.h> API? A lot of things in and outside DPDK depend on
>>>>     <rte_timer.h>, so coexistence may be required to facilitate a smooth
>>>>     transition.
>>>
>>> It's my immediate impression that they are totally different in both design
>> philosophy and API.
>>>
>>> Personal opinion: I would call it an entirely different library.
>>>
>>>>
>>>> * Should the htimer and htw-related files be colocated with rte_timer.c
>>>>     in the timer library?
>>>
>>> Personal opinion: No. This is an entirely different library, and should live
>> for itself in a directory of its own.
>>>
>>>>
>>>> * Would it be useful for applications using asynchronous cancel to
>>>>     have the option of having the timer callback run not only in case of
>>>>     timer expiration, but also cancellation (on the target lcore)? The
>>>>     timer cb signature would need to include an additional parameter in
>>>>     that case.
>>>
>>> If one thread cancels something in another thread, some synchronization
>> between the threads is going to be required anyway. So we could reprase your
>> question: Will the burden of the otherwise required synchronization between
>> the two threads be significantly reduced if the library has the ability to run
>> the callback on asynchronous cancel?
>>>
>>
>> Yes.
>>
>> Intuitively, it seems convenient that if you hand off a timer to a
>> different lcore, the timer callback will be called exactly once,
>> regardless if the timer was canceled or expired.
>>
>> But, as you indicate, you may still need synchronization to solve the
>> resource reclamation issue.
>>
>>> Is such a feature mostly "Must have" or "Nice to have"?
>>>
>>> More thoughts in this area...
>>>
>>> If adding and additional callback parameter, it could be an enum, so the
>> callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel"
>> and more events we have not yet come up with, e.g. "early kick".
>>>
>>
>> Yes, or an int.
>>
>>> Here's an idea off the top of my head: An additional callback parameter has
>> a (small) performance cost incurred with every timer fired (which is a very
>> large multiplier). It might not be required. As an alternative to an "what
>> happened" parameter to the callback, the callback could investigate the state
>> of the object for which the timer fired, and draw its own conclusion on how to
>> proceed. Obviously, this also has a performance cost, but perhaps the callback
>> works on the object's state anyway, making this cost insignificant.
>>>
>>
>> It's not obvious to me that you, in the timer callback, can determine
>> what happened, if the same callback is called both in the cancel and the
>> expired case.
>>
>> The cost of an extra integer passed in a register (or checking a flag,
>> if the timer callback should be called at all at cancellation) that is
>> the concern for me; it's extra bit of API complexity.
> 
> Then introduce the library without this feature. More features can be added later.
> 
> The library will be introduced as "experimental", so we are free to improve it and modify the ABI along the way.
> 
>>
>>> Here's another alternative to adding a "what happened" parameter to the
>> callback:
>>>
>>> The rte_htimer could have one more callback pointer, which (if set) will be
>> called on cancellation of the timer.
>>>
>>
>> This will grow the timer struct with 16 bytes.
> 
> If the rte_htimer struct stays within one cache line, it should be acceptable.
> 

Timer structs are often embedded in other structures, and need not 
themselves be cache line aligned (although the "parent" struct may need 
to be, e.g. if it's dynamically allocated).

So smaller is better. Just consider if you want your attosecond-level 
time stamp in a struct:

struct my_timer {
     uint64_t high_precision_time_high_bits;
     uint64_t high_precision_time_low_bits;
     struct rte_htimer timer;
};

...and you allocate those structs from a mempool. If rte_htimer is small 
enough, you will fit on one cache line.

> On the other hand, this approach is less generic than passing an additional parameter. (E.g. add yet another callback pointer for "early kick"?)
> 
> BTW, async cancel is a form of inter-thread communication. Does this library really need to provide any inter-thread communication mechanisms? Doesn't an inter-thread communication mechanism belong in a separate library?
> 

Yes, <rte_htimer_mgr.h> needs this because:
1) Being able to schedule timers on a remote lcore is a useful feature 
(especially since we don't have much else in terms of deferred work 
mechanisms in DPDK).
2) htimer aspires to be a plug-in replacement for <rte_timer.h> (albeit 
an ABI-breaking one).

The pure HTW is in rte_htw.[ch].

Plus, with the current design, async operations basically come for free 
(if you don't use them), from a performance perspective. The extra 
overhead boils down to occasionally polling an empty ring, which is an 
inexpensive operation.

>>
>>>>
>>>> * Should the rte_htimer be a nested struct, so the htw parts be separated
>>>>     from the htimer parts?
>>>>
>>>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>>>>     <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>>>>     be so?
>>>>
>>>> * rte_htimer struct is only supposed to be used by the application to
>>>>     give an indication of how much memory it needs to allocate, and is
>>>>     its member are not supposed to be directly accessed (w/ the possible
>>>>     exception of the owner_lcore_id field). Should there be a dummy
>>>>     struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>>>>     function instead, serving the same purpose? Better encapsulation,
>>>>     but more inconvenient for applications. Run-time dynamic sizing
>>>>     would force application-level dynamic allocations.
>>>>
>>>> * Asynchronous cancellation is a little tricky to use for the
>>>>     application (primarily due to timer memory reclamation/race
>>>>     issues). Should this functionality be removed?
>>>>
>>>> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>>>>     there should to be a variant which allows the user to specify the
>>>>     time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>>>>     current proposed API is an application calling rte_htimer_mgr_init()
>>>>     and then immediately adding a timer with a relative timeout, in
>>>>     which case the current absolute time used is 0, which might be a
>>>>     surprise.
>>>>
>>>> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>>>>     tick conversion? (Doesn't improve performance on Zen 3, but may
>>>>     do on other CPUs.) Consider <rte_reciprocal.h> as well.
>>>>
>>>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>>>>     used for conversion? Very minor performance gains to be found there,
>>>>     at least on Zen 3 cores.
>>>>
>>>> * Should it be possible to supply the time in rte_htimer_mgr_add()
>>>>     and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>>>>     as TSC? Should it be possible to also use nanoseconds?
>>>>     rte_htimer_mgr_manage_time() would need a flags parameter in that
>>>>     case.
>>>
>>> Do not use TSC anywhere in this library. Let the application decide the
>> meaning of a tick.
>>>
>>>>
>>>> * Would the event timer adapter be best off using <rte_htw.h>
>>>>     directly, or <rte_htimer.h>? In the latter case, there needs to be a
>>>>     way to instantiate more HWTs (similar to the "alt" functions of
>>>>     <rte_timer.h>)?
>>>>
>>>> * Should the PERIODICAL flag (and the complexity it brings) be
>>>>     removed? And leave the application with only single-shot timers, and
>>>>     the option to re-add them in the timer callback.
>>>
>>> First thought: Yes, keep it lean and remove the periodical stuff.
>>>
>>> Second thought: This needs a more detailed analysis.
>>>
>>>   From one angle:
>>>
>>> How many PERIODICAL versus ONESHOT timers do we expect?
>>>
>>
>> I suspect you should be prepared for the ratio being anything.
> 
> In theory, anything is possible. But I'm asking that we consider realistic use cases.
> 
>>
>>> Intuitively, I would use this library for ONESHOT timers, and perhaps
>> implement my periodical timers by other means.
>>>
>>> If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra
>> cost of cancel+add for a few periodical timers.
>>>
>>>   From another angle:
>>>
>>> What is the performance gain with the PERIODICAL flag?
>>>
>>
>> None, pretty much. It's just there for convenience.
> 
> OK, then I suggest that you remove it, unless you get objections.
> 
> The library can be expanded with useful features at any time later. Useless features are (nearly) impossible to remove, once they are in there - they are just "technical debt" with associated maintenance costs, added complexity weaving into other features, etc..
> 
>>
>>> Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles
>> would a "move" function, performing both cancel and add, use?
>>>
>>> And then compare that to the cost (in cycles) of repeating a timer with
>> PERIODICAL?
>>>
>>> Furthermore, not having the PERIODICAL flag probably improves the
>> performance for non-periodical timers. How many cycles could we gain here?
>>>
>>>
>>> Another, vaguely related, idea:
>>>
>>> The callback pointer might not need to be stored per rte_htimer, but could
>> instead be common for the rte_htw.
>>>
>>
>> Do you mean rte_htw, or rte_htimer_mgr?
>>
>> If you make one common callback, all the different parts of the
>> application needs to be coordinated (in a big switch-statement, or
>> something of that sort), or have some convention for using an
>> application-specific wrapper structure (accessed via container_of()).
>>
>> This is a problem if the timer service API consumer is a set of largely
>> uncoordinated software modules.
>>
>> Btw, the eventdev API has the same issue, and the proposed event
>> dispatcher is one way to help facilitate application-internal decoupling.
>>
>> For a module-private rte_htw instance your suggestion may work, but not
>> for <rte_htimer_mgr.h>.
> 
> I was speculating that a common callback pointer might provide a performance benefit for single-purpose HTW instances. (The same concept applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer Cancelled", and an "Early Kick" callback pointer - i.e. having the callback pointers per HTW instance, instead of per timer.)
> 
>>
>>> When a timer fires, the callback probably needs to check/update the state of
>> the object for which the timer fired anyway, so why not just let the
>> application use that state to determine the appropriate action. This might
>> provide some performance benefit.
>>>
>>> It might complicate using one HTW for multiple different purposes, though.
>> Probably a useless idea, but I wanted to share the idea anyway. It might
>> trigger other, better ideas in the community.
>>>
>>>>
>>>> * Should the async result codes and the sync cancel error codes be merged
>>>>     into one set of result codes?
>>>>
>>>> * Should the rte_htimer_mgr_async_add() have a flag which allow
>>>>     buffering add request messages until rte_htimer_mgr_process() is
>>>>     called? Or any manage function. Would reduce ring signaling overhead
>>>>     (i.e., burst enqueue operations instead of single-element
>>>>     enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>>>>     solving the same "problem" a different way. (The signature of such
>>>>     a function would not be pretty.)
>>>>
>>>> * Does the functionality provided by the rte_htimer_mgr_process()
>>>>     function match its the use cases? Should there me a more clear
>>>>     separation between expiry processing and asynchronous operation
>>>>     processing?
>>>>
>>>> * Should the patchset be split into more commits? If so, how?
>>>>
>>>> Thanks to Erik Carrillo for his assistance.
>>>>
>>>> Mattias Rönnblom (2):
>>>>     eal: add bitset type
>>>>     eal: add high-performance timer facility
> 


^ permalink raw reply	[relevance 3%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-03-01 11:18  0%   ` Mattias Rönnblom
@ 2023-03-01 13:31  3%     ` Morten Brørup
  2023-03-01 15:50  3%       ` Mattias Rönnblom
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-03-01 13:31 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 1 March 2023 12.18
> 
> On 2023-02-28 17:01, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Tuesday, 28 February 2023 10.39
> >
> > I have been looking for a high performance timer library (for use in a fast
> path TCP stack), and this looks very useful, Mattias.
> >
> > My initial feedback is based on quickly skimming the patch source code, and
> reading this cover letter.
> >
> >>
> >> This patchset is an attempt to introduce a high-performance, highly
> >> scalable timer facility into DPDK.
> >>
> >> More specifically, the goals for the htimer library are:
> >>
> >> * Efficient handling of a handful up to hundreds of thousands of
> >>    concurrent timers.
> >> * Reduced overhead of adding and canceling timers.
> >> * Provide a service functionally equivalent to that of
> >>    <rte_timer.h>. API/ABI backward compatibility is secondary.
> >>
> >> In the author's opinion, there are two main shortcomings with the
> >> current DPDK timer library (i.e., rte_timer.[ch]).
> >>
> >> One is the synchronization overhead, where heavy-weight full-barrier
> >> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> >> lists, but any thread may add or cancel (or otherwise access) timers
> >> managed by another lcore (and thus resides in its timer skip list).
> >>
> >> The other is an algorithmic shortcoming, with rte_timer.c's reliance
> >> on a skip list, which, seemingly, is less efficient than certain
> >> alternatives.
> >>
> >> This patchset implements a hierarchical timer wheel (HWT, in
> >
> > Typo: HWT or HTW?
> 
> Yes. I don't understand how I could managed to make so many such HTW ->
> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
> 
> >
> >> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> >> Hierarchical Timing Wheels: Data Structures for the Efficient
> >> Implementation of a Timer Facility". A HWT is a data structure
> >> purposely design for this task, and used by many operating system
> >> kernel timer facilities.
> >>
> >> To further improve the solution described by Varghese and Lauck, a
> >> bitset is placed in front of each of the timer wheel in the HWT,
> >> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
> >> and expiry processing).
> >>
> >> Cycle-efficient scanning and manipulation of these bitsets are crucial
> >> for the HWT's performance.
> >>
> >> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
> >> instance, much like rte_timer.c keeps a per-lcore skip list.
> >>
> >> To avoid expensive synchronization overhead for thread-local timer
> >> management, the HWTs are accessed only from the "owning" thread.  Any
> >> interaction any other thread has with a particular lcore's timer
> >> wheel goes over a set of DPDK rings. A side-effect of this design is
> >> that all operations working toward a "remote" HWT must be
> >> asynchronous.
> >>
> >> The <rte_htimer.h> API is available only to EAL threads and registered
> >> non-EAL threads.
> >>
> >> The htimer API allows the application to supply the current time,
> >> useful in case it already has retrieved this for other purposes,
> >> saving the cost of a rdtsc instruction (or its equivalent).
> >>
> >> Relative htimer does not retrieve a new time, but reuse the current
> >> time (as known via/at-the-time of the manage-call), again to shave off
> >> some cycles of overhead.
> >
> > I have a comment to the two points above.
> >
> > I agree that the application should supply the current time.
> >
> > This should be the concept throughout the library. I don't understand why
> TSC is used in the library at all?
> >
> > Please use a unit-less tick, and let the application decide what one tick
> means.
> >
> 
> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
> sense if you think of the user of the API as not just a "monolithic"
> application, but rather a set of different modules, developed by
> different organizations, and reused across a set of applications. The
> idea behind the API design is they should all be able to share one timer
> service instance.
> 
> The different parts of the application and any future DPDK platform
> modules that use the htimer service needs to agree what a tick means in
> terms of actual wall-time, if it's not mandated by the API.

I see. Then those non-monolithic applications can agree that the unit of time is nanoseconds, or whatever makes sense for those applications. And then they can instantiate one shared HTW for that purpose.

There is no need to impose such an API limit on other users of the library.

> 
> There might be room for module-specific timer wheels as well, with
> different resolution or other characteristics. The event timer adapter's
> use of a timer wheel could be one example (although I'm not sure it is).

We are not using the event device, and I have not looked into it, so I have no qualified comments to this.

> 
> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
> one could consider make the <rte_htw.h> API public as well. That is what
> I think you as asking for here: a generic timer wheel that doesn't know
> anything about time sources, time source time -> tick conversion, or
> timer source time -> monotonic wall time conversion, and maybe is also
> not bound to a particular thread.

Yes, that is what I had been searching the Internet for.

(I'm not sure what you mean by "not bound to a particular thread". Your per-thread design seems good to me.)

I don't want more stuff in the EAL. What I want is high-performance DPDK libraries we can use in our applications.

> 
> I picked TSC because it seemed like a good "universal time unit" for
> DPDK. rdtsc (and its equivalent) is also a very precise (especially on
> x86) and cheap-to-retrieve (especially on ARM, from what I understand).

The TSC does have excellent performance, but on all other parameters it is a horrible time keeper: The measurement unit depends on the underlying hardware, the TSC drifts depending on temperature, it cannot be PTP synchronized, the list is endless!

> 
> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
> format) should be the default for timer expiration time instead of TSC.
> TSC could still be an option for passing the current time, since TSC
> will be a common time source, and it shaves off one conversion.

There are many reasons why nanoseconds is a much better choice than TSC.

> 
> > A unit-less tick will also let the application instantiate a HTW with higher
> resolution than the TSC. (E.g. think about oversampling in audio processing,
> or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound
> and look better.)

Some of the timing data in our application have a resolution orders of magnitude higher than one nanosecond. If we combined that with a HTW library with nanosecond resolution, we would need to keep these timer values in two locations: The original high-res timer in our data structure, and the shadow low-res (nanosecond) timer in the HTW.

We might also need to frequently update the HTW timers to prevent drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6) rounded. This level of drifting would also make periodic timers in the HTW useless.

Please note: I haven't really considered merging the high-res timing in our application with this HTW, and I'm also not saying that PERIODIC timers in the HTW are required or even useful for our application. I'm only providing arguments for a unit-less time!

> >
> > For reference (supporting my suggestion), the dynamic timestamp field in the
> rte_mbuf structure is also defined as being unit-less. (I think NVIDIA
> implements it as nanoseconds, but that's an implementation specific choice.)
> >
> >>
> >> A semantic improvement compared to the <rte_timer.h> API is that the
> >> htimer library can give a definite answer on the question if the timer
> >> expiry callback was called, after a timer has been canceled.
> >>
> >> Below is a performance data from DPDK's 'app/test' micro benchmarks,
> >> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> >> test_htimer_mgr_perf.c) aren't identical in their structure, but the
> >> numbers give some indication of the difference.
> >>
> >> Use case               htimer  timer
> >> ------------------------------------
> >> Add timer                 28    253
> >> Cancel timer              10    412
> >> Async add (source lcore)  64
> >> Async add (target lcore)  13
> >>
> >> (AMD 5900X CPU. Time in TSC.)
> >>
> >> Prototype integration of the htimer library into real, timer-heavy,
> >> applications indicates that htimer may result in significant
> >> application-level performance gains.
> >>
> >> The bitset implementation which the HWT implementation depends upon
> >> seemed generic-enough and potentially useful outside the world of
> >> HWTs, to justify being located in the EAL.
> >>
> >> This patchset is very much an RFC, and the author is yet to form an
> >> opinion on many important issues.
> >>
> >> * If deemed a suitable replacement, should the htimer replace the
> >>    current DPDK timer library in some particular (ABI-breaking)
> >>    release, or should it live side-by-side with the then-legacy
> >>    <rte_timer.h> API? A lot of things in and outside DPDK depend on
> >>    <rte_timer.h>, so coexistence may be required to facilitate a smooth
> >>    transition.
> >
> > It's my immediate impression that they are totally different in both design
> philosophy and API.
> >
> > Personal opinion: I would call it an entirely different library.
> >
> >>
> >> * Should the htimer and htw-related files be colocated with rte_timer.c
> >>    in the timer library?
> >
> > Personal opinion: No. This is an entirely different library, and should live
> for itself in a directory of its own.
> >
> >>
> >> * Would it be useful for applications using asynchronous cancel to
> >>    have the option of having the timer callback run not only in case of
> >>    timer expiration, but also cancellation (on the target lcore)? The
> >>    timer cb signature would need to include an additional parameter in
> >>    that case.
> >
> > If one thread cancels something in another thread, some synchronization
> between the threads is going to be required anyway. So we could reprase your
> question: Will the burden of the otherwise required synchronization between
> the two threads be significantly reduced if the library has the ability to run
> the callback on asynchronous cancel?
> >
> 
> Yes.
> 
> Intuitively, it seems convenient that if you hand off a timer to a
> different lcore, the timer callback will be called exactly once,
> regardless if the timer was canceled or expired.
> 
> But, as you indicate, you may still need synchronization to solve the
> resource reclamation issue.
> 
> > Is such a feature mostly "Must have" or "Nice to have"?
> >
> > More thoughts in this area...
> >
> > If adding and additional callback parameter, it could be an enum, so the
> callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel"
> and more events we have not yet come up with, e.g. "early kick".
> >
> 
> Yes, or an int.
> 
> > Here's an idea off the top of my head: An additional callback parameter has
> a (small) performance cost incurred with every timer fired (which is a very
> large multiplier). It might not be required. As an alternative to an "what
> happened" parameter to the callback, the callback could investigate the state
> of the object for which the timer fired, and draw its own conclusion on how to
> proceed. Obviously, this also has a performance cost, but perhaps the callback
> works on the object's state anyway, making this cost insignificant.
> >
> 
> It's not obvious to me that you, in the timer callback, can determine
> what happened, if the same callback is called both in the cancel and the
> expired case.
> 
> The cost of an extra integer passed in a register (or checking a flag,
> if the timer callback should be called at all at cancellation) that is
> the concern for me; it's extra bit of API complexity.

Then introduce the library without this feature. More features can be added later.

The library will be introduced as "experimental", so we are free to improve it and modify the ABI along the way.

> 
> > Here's another alternative to adding a "what happened" parameter to the
> callback:
> >
> > The rte_htimer could have one more callback pointer, which (if set) will be
> called on cancellation of the timer.
> >
> 
> This will grow the timer struct with 16 bytes.

If the rte_htimer struct stays within one cache line, it should be acceptable.

On the other hand, this approach is less generic than passing an additional parameter. (E.g. add yet another callback pointer for "early kick"?)

BTW, async cancel is a form of inter-thread communication. Does this library really need to provide any inter-thread communication mechanisms? Doesn't an inter-thread communication mechanism belong in a separate library?

> 
> >>
> >> * Should the rte_htimer be a nested struct, so the htw parts be separated
> >>    from the htimer parts?
> >>
> >> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
> >>    <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
> >>    be so?
> >>
> >> * rte_htimer struct is only supposed to be used by the application to
> >>    give an indication of how much memory it needs to allocate, and is
> >>    its member are not supposed to be directly accessed (w/ the possible
> >>    exception of the owner_lcore_id field). Should there be a dummy
> >>    struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
> >>    function instead, serving the same purpose? Better encapsulation,
> >>    but more inconvenient for applications. Run-time dynamic sizing
> >>    would force application-level dynamic allocations.
> >>
> >> * Asynchronous cancellation is a little tricky to use for the
> >>    application (primarily due to timer memory reclamation/race
> >>    issues). Should this functionality be removed?
> >>
> >> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
> >>    there should to be a variant which allows the user to specify the
> >>    time (to match rte_htimer_mgr_manage_time()). One pitfall with the
> >>    current proposed API is an application calling rte_htimer_mgr_init()
> >>    and then immediately adding a timer with a relative timeout, in
> >>    which case the current absolute time used is 0, which might be a
> >>    surprise.
> >>
> >> * Should libdivide (optionally) be used to avoid the div in the TSC ->
> >>    tick conversion? (Doesn't improve performance on Zen 3, but may
> >>    do on other CPUs.) Consider <rte_reciprocal.h> as well.
> >>
> >> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
> >>    used for conversion? Very minor performance gains to be found there,
> >>    at least on Zen 3 cores.
> >>
> >> * Should it be possible to supply the time in rte_htimer_mgr_add()
> >>    and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
> >>    as TSC? Should it be possible to also use nanoseconds?
> >>    rte_htimer_mgr_manage_time() would need a flags parameter in that
> >>    case.
> >
> > Do not use TSC anywhere in this library. Let the application decide the
> meaning of a tick.
> >
> >>
> >> * Would the event timer adapter be best off using <rte_htw.h>
> >>    directly, or <rte_htimer.h>? In the latter case, there needs to be a
> >>    way to instantiate more HWTs (similar to the "alt" functions of
> >>    <rte_timer.h>)?
> >>
> >> * Should the PERIODICAL flag (and the complexity it brings) be
> >>    removed? And leave the application with only single-shot timers, and
> >>    the option to re-add them in the timer callback.
> >
> > First thought: Yes, keep it lean and remove the periodical stuff.
> >
> > Second thought: This needs a more detailed analysis.
> >
> >  From one angle:
> >
> > How many PERIODICAL versus ONESHOT timers do we expect?
> >
> 
> I suspect you should be prepared for the ratio being anything.

In theory, anything is possible. But I'm asking that we consider realistic use cases.

> 
> > Intuitively, I would use this library for ONESHOT timers, and perhaps
> implement my periodical timers by other means.
> >
> > If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra
> cost of cancel+add for a few periodical timers.
> >
> >  From another angle:
> >
> > What is the performance gain with the PERIODICAL flag?
> >
> 
> None, pretty much. It's just there for convenience.

OK, then I suggest that you remove it, unless you get objections.

The library can be expanded with useful features at any time later. Useless features are (nearly) impossible to remove, once they are in there - they are just "technical debt" with associated maintenance costs, added complexity weaving into other features, etc..

> 
> > Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles
> would a "move" function, performing both cancel and add, use?
> >
> > And then compare that to the cost (in cycles) of repeating a timer with
> PERIODICAL?
> >
> > Furthermore, not having the PERIODICAL flag probably improves the
> performance for non-periodical timers. How many cycles could we gain here?
> >
> >
> > Another, vaguely related, idea:
> >
> > The callback pointer might not need to be stored per rte_htimer, but could
> instead be common for the rte_htw.
> >
> 
> Do you mean rte_htw, or rte_htimer_mgr?
> 
> If you make one common callback, all the different parts of the
> application needs to be coordinated (in a big switch-statement, or
> something of that sort), or have some convention for using an
> application-specific wrapper structure (accessed via container_of()).
> 
> This is a problem if the timer service API consumer is a set of largely
> uncoordinated software modules.
> 
> Btw, the eventdev API has the same issue, and the proposed event
> dispatcher is one way to help facilitate application-internal decoupling.
> 
> For a module-private rte_htw instance your suggestion may work, but not
> for <rte_htimer_mgr.h>.

I was speculating that a common callback pointer might provide a performance benefit for single-purpose HTW instances. (The same concept applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer Cancelled", and an "Early Kick" callback pointer - i.e. having the callback pointers per HTW instance, instead of per timer.)

> 
> > When a timer fires, the callback probably needs to check/update the state of
> the object for which the timer fired anyway, so why not just let the
> application use that state to determine the appropriate action. This might
> provide some performance benefit.
> >
> > It might complicate using one HTW for multiple different purposes, though.
> Probably a useless idea, but I wanted to share the idea anyway. It might
> trigger other, better ideas in the community.
> >
> >>
> >> * Should the async result codes and the sync cancel error codes be merged
> >>    into one set of result codes?
> >>
> >> * Should the rte_htimer_mgr_async_add() have a flag which allow
> >>    buffering add request messages until rte_htimer_mgr_process() is
> >>    called? Or any manage function. Would reduce ring signaling overhead
> >>    (i.e., burst enqueue operations instead of single-element
> >>    enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
> >>    solving the same "problem" a different way. (The signature of such
> >>    a function would not be pretty.)
> >>
> >> * Does the functionality provided by the rte_htimer_mgr_process()
> >>    function match its the use cases? Should there me a more clear
> >>    separation between expiry processing and asynchronous operation
> >>    processing?
> >>
> >> * Should the patchset be split into more commits? If so, how?
> >>
> >> Thanks to Erik Carrillo for his assistance.
> >>
> >> Mattias Rönnblom (2):
> >>    eal: add bitset type
> >>    eal: add high-performance timer facility


^ permalink raw reply	[relevance 3%]

* Re: [RFC 0/2] Add high-performance timer facility
  2023-02-28 16:01  0% ` Morten Brørup
@ 2023-03-01 11:18  0%   ` Mattias Rönnblom
  2023-03-01 13:31  3%     ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Mattias Rönnblom @ 2023-03-01 11:18 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

On 2023-02-28 17:01, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Tuesday, 28 February 2023 10.39
> 
> I have been looking for a high performance timer library (for use in a fast path TCP stack), and this looks very useful, Mattias.
> 
> My initial feedback is based on quickly skimming the patch source code, and reading this cover letter.
> 
>>
>> This patchset is an attempt to introduce a high-performance, highly
>> scalable timer facility into DPDK.
>>
>> More specifically, the goals for the htimer library are:
>>
>> * Efficient handling of a handful up to hundreds of thousands of
>>    concurrent timers.
>> * Reduced overhead of adding and canceling timers.
>> * Provide a service functionally equivalent to that of
>>    <rte_timer.h>. API/ABI backward compatibility is secondary.
>>
>> In the author's opinion, there are two main shortcomings with the
>> current DPDK timer library (i.e., rte_timer.[ch]).
>>
>> One is the synchronization overhead, where heavy-weight full-barrier
>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
>> lists, but any thread may add or cancel (or otherwise access) timers
>> managed by another lcore (and thus resides in its timer skip list).
>>
>> The other is an algorithmic shortcoming, with rte_timer.c's reliance
>> on a skip list, which, seemingly, is less efficient than certain
>> alternatives.
>>
>> This patchset implements a hierarchical timer wheel (HWT, in
> 
> Typo: HWT or HTW?

Yes. I don't understand how I could managed to make so many such HTW -> 
HWT typos. At least I got the filenames (rte_htw.[ch]) correct.

> 
>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
>> Hierarchical Timing Wheels: Data Structures for the Efficient
>> Implementation of a Timer Facility". A HWT is a data structure
>> purposely design for this task, and used by many operating system
>> kernel timer facilities.
>>
>> To further improve the solution described by Varghese and Lauck, a
>> bitset is placed in front of each of the timer wheel in the HWT,
>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
>> and expiry processing).
>>
>> Cycle-efficient scanning and manipulation of these bitsets are crucial
>> for the HWT's performance.
>>
>> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
>> instance, much like rte_timer.c keeps a per-lcore skip list.
>>
>> To avoid expensive synchronization overhead for thread-local timer
>> management, the HWTs are accessed only from the "owning" thread.  Any
>> interaction any other thread has with a particular lcore's timer
>> wheel goes over a set of DPDK rings. A side-effect of this design is
>> that all operations working toward a "remote" HWT must be
>> asynchronous.
>>
>> The <rte_htimer.h> API is available only to EAL threads and registered
>> non-EAL threads.
>>
>> The htimer API allows the application to supply the current time,
>> useful in case it already has retrieved this for other purposes,
>> saving the cost of a rdtsc instruction (or its equivalent).
>>
>> Relative htimer does not retrieve a new time, but reuse the current
>> time (as known via/at-the-time of the manage-call), again to shave off
>> some cycles of overhead.
> 
> I have a comment to the two points above.
> 
> I agree that the application should supply the current time.
> 
> This should be the concept throughout the library. I don't understand why TSC is used in the library at all?
> 
> Please use a unit-less tick, and let the application decide what one tick means.
> 

I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more 
sense if you think of the user of the API as not just a "monolithic" 
application, but rather a set of different modules, developed by 
different organizations, and reused across a set of applications. The 
idea behind the API design is they should all be able to share one timer 
service instance.

The different parts of the application and any future DPDK platform 
modules that use the htimer service needs to agree what a tick means in 
terms of actual wall-time, if it's not mandated by the API.

There might be room for module-specific timer wheels as well, with 
different resolution or other characteristics. The event timer adapter's 
use of a timer wheel could be one example (although I'm not sure it is).

If timer-wheel-as-a-private-lego-piece is also a valid use case, then 
one could consider make the <rte_htw.h> API public as well. That is what 
I think you as asking for here: a generic timer wheel that doesn't know 
anything about time sources, time source time -> tick conversion, or 
timer source time -> monotonic wall time conversion, and maybe is also 
not bound to a particular thread.

I picked TSC because it seemed like a good "universal time unit" for 
DPDK. rdtsc (and its equivalent) is also a very precise (especially on 
x86) and cheap-to-retrieve (especially on ARM, from what I understand).

That said, at the moment, I'm leaning toward nanoseconds (uint64_t 
format) should be the default for timer expiration time instead of TSC. 
TSC could still be an option for passing the current time, since TSC 
will be a common time source, and it shaves off one conversion.

> A unit-less tick will also let the application instantiate a HTW with higher resolution than the TSC. (E.g. think about oversampling in audio processing, or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound and look better.)
> 
> For reference (supporting my suggestion), the dynamic timestamp field in the rte_mbuf structure is also defined as being unit-less. (I think NVIDIA implements it as nanoseconds, but that's an implementation specific choice.)
> 
>>
>> A semantic improvement compared to the <rte_timer.h> API is that the
>> htimer library can give a definite answer on the question if the timer
>> expiry callback was called, after a timer has been canceled.
>>
>> Below is a performance data from DPDK's 'app/test' micro benchmarks,
>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
>> test_htimer_mgr_perf.c) aren't identical in their structure, but the
>> numbers give some indication of the difference.
>>
>> Use case               htimer  timer
>> ------------------------------------
>> Add timer                 28    253
>> Cancel timer              10    412
>> Async add (source lcore)  64
>> Async add (target lcore)  13
>>
>> (AMD 5900X CPU. Time in TSC.)
>>
>> Prototype integration of the htimer library into real, timer-heavy,
>> applications indicates that htimer may result in significant
>> application-level performance gains.
>>
>> The bitset implementation which the HWT implementation depends upon
>> seemed generic-enough and potentially useful outside the world of
>> HWTs, to justify being located in the EAL.
>>
>> This patchset is very much an RFC, and the author is yet to form an
>> opinion on many important issues.
>>
>> * If deemed a suitable replacement, should the htimer replace the
>>    current DPDK timer library in some particular (ABI-breaking)
>>    release, or should it live side-by-side with the then-legacy
>>    <rte_timer.h> API? A lot of things in and outside DPDK depend on
>>    <rte_timer.h>, so coexistence may be required to facilitate a smooth
>>    transition.
> 
> It's my immediate impression that they are totally different in both design philosophy and API.
> 
> Personal opinion: I would call it an entirely different library.
> 
>>
>> * Should the htimer and htw-related files be colocated with rte_timer.c
>>    in the timer library?
> 
> Personal opinion: No. This is an entirely different library, and should live for itself in a directory of its own.
> 
>>
>> * Would it be useful for applications using asynchronous cancel to
>>    have the option of having the timer callback run not only in case of
>>    timer expiration, but also cancellation (on the target lcore)? The
>>    timer cb signature would need to include an additional parameter in
>>    that case.
> 
> If one thread cancels something in another thread, some synchronization between the threads is going to be required anyway. So we could reprase your question: Will the burden of the otherwise required synchronization between the two threads be significantly reduced if the library has the ability to run the callback on asynchronous cancel?
> 

Yes.

Intuitively, it seems convenient that if you hand off a timer to a 
different lcore, the timer callback will be called exactly once, 
regardless if the timer was canceled or expired.

But, as you indicate, you may still need synchronization to solve the 
resource reclamation issue.

> Is such a feature mostly "Must have" or "Nice to have"?
> 
> More thoughts in this area...
> 
> If adding and additional callback parameter, it could be an enum, so the callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel" and more events we have not yet come up with, e.g. "early kick".
> 

Yes, or an int.

> Here's an idea off the top of my head: An additional callback parameter has a (small) performance cost incurred with every timer fired (which is a very large multiplier). It might not be required. As an alternative to an "what happened" parameter to the callback, the callback could investigate the state of the object for which the timer fired, and draw its own conclusion on how to proceed. Obviously, this also has a performance cost, but perhaps the callback works on the object's state anyway, making this cost insignificant.
> 

It's not obvious to me that you, in the timer callback, can determine 
what happened, if the same callback is called both in the cancel and the 
expired case.

The cost of an extra integer passed in a register (or checking a flag, 
if the timer callback should be called at all at cancellation) that is 
the concern for me; it's extra bit of API complexity.

> Here's another alternative to adding a "what happened" parameter to the callback:
> 
> The rte_htimer could have one more callback pointer, which (if set) will be called on cancellation of the timer.
> 

This will grow the timer struct with 16 bytes.

>>
>> * Should the rte_htimer be a nested struct, so the htw parts be separated
>>    from the htimer parts?
>>
>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>>    <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>>    be so?
>>
>> * rte_htimer struct is only supposed to be used by the application to
>>    give an indication of how much memory it needs to allocate, and is
>>    its member are not supposed to be directly accessed (w/ the possible
>>    exception of the owner_lcore_id field). Should there be a dummy
>>    struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>>    function instead, serving the same purpose? Better encapsulation,
>>    but more inconvenient for applications. Run-time dynamic sizing
>>    would force application-level dynamic allocations.
>>
>> * Asynchronous cancellation is a little tricky to use for the
>>    application (primarily due to timer memory reclamation/race
>>    issues). Should this functionality be removed?
>>
>> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>>    there should to be a variant which allows the user to specify the
>>    time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>>    current proposed API is an application calling rte_htimer_mgr_init()
>>    and then immediately adding a timer with a relative timeout, in
>>    which case the current absolute time used is 0, which might be a
>>    surprise.
>>
>> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>>    tick conversion? (Doesn't improve performance on Zen 3, but may
>>    do on other CPUs.) Consider <rte_reciprocal.h> as well.
>>
>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>>    used for conversion? Very minor performance gains to be found there,
>>    at least on Zen 3 cores.
>>
>> * Should it be possible to supply the time in rte_htimer_mgr_add()
>>    and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>>    as TSC? Should it be possible to also use nanoseconds?
>>    rte_htimer_mgr_manage_time() would need a flags parameter in that
>>    case.
> 
> Do not use TSC anywhere in this library. Let the application decide the meaning of a tick.
> 
>>
>> * Would the event timer adapter be best off using <rte_htw.h>
>>    directly, or <rte_htimer.h>? In the latter case, there needs to be a
>>    way to instantiate more HWTs (similar to the "alt" functions of
>>    <rte_timer.h>)?
>>
>> * Should the PERIODICAL flag (and the complexity it brings) be
>>    removed? And leave the application with only single-shot timers, and
>>    the option to re-add them in the timer callback.
> 
> First thought: Yes, keep it lean and remove the periodical stuff.
> 
> Second thought: This needs a more detailed analysis.
> 
>  From one angle:
> 
> How many PERIODICAL versus ONESHOT timers do we expect?
> 

I suspect you should be prepared for the ratio being anything.

> Intuitively, I would use this library for ONESHOT timers, and perhaps implement my periodical timers by other means.
> 
> If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra cost of cancel+add for a few periodical timers.
> 
>  From another angle:
> 
> What is the performance gain with the PERIODICAL flag?
> 

None, pretty much. It's just there for convenience.

> Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles would a "move" function, performing both cancel and add, use?
> 
> And then compare that to the cost (in cycles) of repeating a timer with PERIODICAL?
> 
> Furthermore, not having the PERIODICAL flag probably improves the performance for non-periodical timers. How many cycles could we gain here?
> 
> 
> Another, vaguely related, idea:
> 
> The callback pointer might not need to be stored per rte_htimer, but could instead be common for the rte_htw.
> 

Do you mean rte_htw, or rte_htimer_mgr?

If you make one common callback, all the different parts of the 
application needs to be coordinated (in a big switch-statement, or 
something of that sort), or have some convention for using an 
application-specific wrapper structure (accessed via container_of()).

This is a problem if the timer service API consumer is a set of largely 
uncoordinated software modules.

Btw, the eventdev API has the same issue, and the proposed event 
dispatcher is one way to help facilitate application-internal decoupling.

For a module-private rte_htw instance your suggestion may work, but not 
for <rte_htimer_mgr.h>.

> When a timer fires, the callback probably needs to check/update the state of the object for which the timer fired anyway, so why not just let the application use that state to determine the appropriate action. This might provide some performance benefit.
> 
> It might complicate using one HTW for multiple different purposes, though. Probably a useless idea, but I wanted to share the idea anyway. It might trigger other, better ideas in the community.
> 
>>
>> * Should the async result codes and the sync cancel error codes be merged
>>    into one set of result codes?
>>
>> * Should the rte_htimer_mgr_async_add() have a flag which allow
>>    buffering add request messages until rte_htimer_mgr_process() is
>>    called? Or any manage function. Would reduce ring signaling overhead
>>    (i.e., burst enqueue operations instead of single-element
>>    enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>>    solving the same "problem" a different way. (The signature of such
>>    a function would not be pretty.)
>>
>> * Does the functionality provided by the rte_htimer_mgr_process()
>>    function match its the use cases? Should there me a more clear
>>    separation between expiry processing and asynchronous operation
>>    processing?
>>
>> * Should the patchset be split into more commits? If so, how?
>>
>> Thanks to Erik Carrillo for his assistance.
>>
>> Mattias Rönnblom (2):
>>    eal: add bitset type
>>    eal: add high-performance timer facility


^ permalink raw reply	[relevance 0%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
@ 2023-02-28 16:01  0% ` Morten Brørup
  2023-03-01 11:18  0%   ` Mattias Rönnblom
  2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-28 16:01 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 28 February 2023 10.39

I have been looking for a high performance timer library (for use in a fast path TCP stack), and this looks very useful, Mattias.

My initial feedback is based on quickly skimming the patch source code, and reading this cover letter.

> 
> This patchset is an attempt to introduce a high-performance, highly
> scalable timer facility into DPDK.
> 
> More specifically, the goals for the htimer library are:
> 
> * Efficient handling of a handful up to hundreds of thousands of
>   concurrent timers.
> * Reduced overhead of adding and canceling timers.
> * Provide a service functionally equivalent to that of
>   <rte_timer.h>. API/ABI backward compatibility is secondary.
> 
> In the author's opinion, there are two main shortcomings with the
> current DPDK timer library (i.e., rte_timer.[ch]).
> 
> One is the synchronization overhead, where heavy-weight full-barrier
> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> lists, but any thread may add or cancel (or otherwise access) timers
> managed by another lcore (and thus resides in its timer skip list).
> 
> The other is an algorithmic shortcoming, with rte_timer.c's reliance
> on a skip list, which, seemingly, is less efficient than certain
> alternatives.
> 
> This patchset implements a hierarchical timer wheel (HWT, in

Typo: HWT or HTW?

> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> Hierarchical Timing Wheels: Data Structures for the Efficient
> Implementation of a Timer Facility". A HWT is a data structure
> purposely design for this task, and used by many operating system
> kernel timer facilities.
> 
> To further improve the solution described by Varghese and Lauck, a
> bitset is placed in front of each of the timer wheel in the HWT,
> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
> and expiry processing).
> 
> Cycle-efficient scanning and manipulation of these bitsets are crucial
> for the HWT's performance.
> 
> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
> instance, much like rte_timer.c keeps a per-lcore skip list.
> 
> To avoid expensive synchronization overhead for thread-local timer
> management, the HWTs are accessed only from the "owning" thread.  Any
> interaction any other thread has with a particular lcore's timer
> wheel goes over a set of DPDK rings. A side-effect of this design is
> that all operations working toward a "remote" HWT must be
> asynchronous.
> 
> The <rte_htimer.h> API is available only to EAL threads and registered
> non-EAL threads.
> 
> The htimer API allows the application to supply the current time,
> useful in case it already has retrieved this for other purposes,
> saving the cost of a rdtsc instruction (or its equivalent).
> 
> Relative htimer does not retrieve a new time, but reuse the current
> time (as known via/at-the-time of the manage-call), again to shave off
> some cycles of overhead.

I have a comment to the two points above.

I agree that the application should supply the current time.

This should be the concept throughout the library. I don't understand why TSC is used in the library at all?

Please use a unit-less tick, and let the application decide what one tick means.

A unit-less tick will also let the application instantiate a HTW with higher resolution than the TSC. (E.g. think about oversampling in audio processing, or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound and look better.)

For reference (supporting my suggestion), the dynamic timestamp field in the rte_mbuf structure is also defined as being unit-less. (I think NVIDIA implements it as nanoseconds, but that's an implementation specific choice.)

> 
> A semantic improvement compared to the <rte_timer.h> API is that the
> htimer library can give a definite answer on the question if the timer
> expiry callback was called, after a timer has been canceled.
> 
> Below is a performance data from DPDK's 'app/test' micro benchmarks,
> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> test_htimer_mgr_perf.c) aren't identical in their structure, but the
> numbers give some indication of the difference.
> 
> Use case               htimer  timer
> ------------------------------------
> Add timer                 28    253
> Cancel timer              10    412
> Async add (source lcore)  64
> Async add (target lcore)  13
> 
> (AMD 5900X CPU. Time in TSC.)
> 
> Prototype integration of the htimer library into real, timer-heavy,
> applications indicates that htimer may result in significant
> application-level performance gains.
> 
> The bitset implementation which the HWT implementation depends upon
> seemed generic-enough and potentially useful outside the world of
> HWTs, to justify being located in the EAL.
> 
> This patchset is very much an RFC, and the author is yet to form an
> opinion on many important issues.
> 
> * If deemed a suitable replacement, should the htimer replace the
>   current DPDK timer library in some particular (ABI-breaking)
>   release, or should it live side-by-side with the then-legacy
>   <rte_timer.h> API? A lot of things in and outside DPDK depend on
>   <rte_timer.h>, so coexistence may be required to facilitate a smooth
>   transition.

It's my immediate impression that they are totally different in both design philosophy and API.

Personal opinion: I would call it an entirely different library.

> 
> * Should the htimer and htw-related files be colocated with rte_timer.c
>   in the timer library?

Personal opinion: No. This is an entirely different library, and should live for itself in a directory of its own.

> 
> * Would it be useful for applications using asynchronous cancel to
>   have the option of having the timer callback run not only in case of
>   timer expiration, but also cancellation (on the target lcore)? The
>   timer cb signature would need to include an additional parameter in
>   that case.

If one thread cancels something in another thread, some synchronization between the threads is going to be required anyway. So we could reprase your question: Will the burden of the otherwise required synchronization between the two threads be significantly reduced if the library has the ability to run the callback on asynchronous cancel?

Is such a feature mostly "Must have" or "Nice to have"?

More thoughts in this area...

If adding and additional callback parameter, it could be an enum, so the callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel" and more events we have not yet come up with, e.g. "early kick".

Here's an idea off the top of my head: An additional callback parameter has a (small) performance cost incurred with every timer fired (which is a very large multiplier). It might not be required. As an alternative to an "what happened" parameter to the callback, the callback could investigate the state of the object for which the timer fired, and draw its own conclusion on how to proceed. Obviously, this also has a performance cost, but perhaps the callback works on the object's state anyway, making this cost insignificant.

Here's another alternative to adding a "what happened" parameter to the callback:

The rte_htimer could have one more callback pointer, which (if set) will be called on cancellation of the timer.

> 
> * Should the rte_htimer be a nested struct, so the htw parts be separated
>   from the htimer parts?
> 
> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>   <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>   be so?
> 
> * rte_htimer struct is only supposed to be used by the application to
>   give an indication of how much memory it needs to allocate, and is
>   its member are not supposed to be directly accessed (w/ the possible
>   exception of the owner_lcore_id field). Should there be a dummy
>   struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>   function instead, serving the same purpose? Better encapsulation,
>   but more inconvenient for applications. Run-time dynamic sizing
>   would force application-level dynamic allocations.
> 
> * Asynchronous cancellation is a little tricky to use for the
>   application (primarily due to timer memory reclamation/race
>   issues). Should this functionality be removed?
> 
> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>   there should to be a variant which allows the user to specify the
>   time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>   current proposed API is an application calling rte_htimer_mgr_init()
>   and then immediately adding a timer with a relative timeout, in
>   which case the current absolute time used is 0, which might be a
>   surprise.
> 
> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>   tick conversion? (Doesn't improve performance on Zen 3, but may
>   do on other CPUs.) Consider <rte_reciprocal.h> as well.
> 
> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>   used for conversion? Very minor performance gains to be found there,
>   at least on Zen 3 cores.
> 
> * Should it be possible to supply the time in rte_htimer_mgr_add()
>   and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>   as TSC? Should it be possible to also use nanoseconds?
>   rte_htimer_mgr_manage_time() would need a flags parameter in that
>   case.

Do not use TSC anywhere in this library. Let the application decide the meaning of a tick.

> 
> * Would the event timer adapter be best off using <rte_htw.h>
>   directly, or <rte_htimer.h>? In the latter case, there needs to be a
>   way to instantiate more HWTs (similar to the "alt" functions of
>   <rte_timer.h>)?
> 
> * Should the PERIODICAL flag (and the complexity it brings) be
>   removed? And leave the application with only single-shot timers, and
>   the option to re-add them in the timer callback.

First thought: Yes, keep it lean and remove the periodical stuff.

Second thought: This needs a more detailed analysis.

From one angle:

How many PERIODICAL versus ONESHOT timers do we expect?

Intuitively, I would use this library for ONESHOT timers, and perhaps implement my periodical timers by other means.

If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra cost of cancel+add for a few periodical timers.

From another angle:

What is the performance gain with the PERIODICAL flag?

Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles would a "move" function, performing both cancel and add, use?

And then compare that to the cost (in cycles) of repeating a timer with PERIODICAL?

Furthermore, not having the PERIODICAL flag probably improves the performance for non-periodical timers. How many cycles could we gain here?

Another, vaguely related, idea:

The callback pointer might not need to be stored per rte_htimer, but could instead be common for the rte_htw.

When a timer fires, the callback probably needs to check/update the state of the object for which the timer fired anyway, so why not just let the application use that state to determine the appropriate action. This might provide some performance benefit.

It might complicate using one HTW for multiple different purposes, though. Probably a useless idea, but I wanted to share the idea anyway. It might trigger other, better ideas in the community.

> 
> * Should the async result codes and the sync cancel error codes be merged
>   into one set of result codes?
> 
> * Should the rte_htimer_mgr_async_add() have a flag which allow
>   buffering add request messages until rte_htimer_mgr_process() is
>   called? Or any manage function. Would reduce ring signaling overhead
>   (i.e., burst enqueue operations instead of single-element
>   enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>   solving the same "problem" a different way. (The signature of such
>   a function would not be pretty.)
> 
> * Does the functionality provided by the rte_htimer_mgr_process()
>   function match its the use cases? Should there me a more clear
>   separation between expiry processing and asynchronous operation
>   processing?
> 
> * Should the patchset be split into more commits? If so, how?
> 
> Thanks to Erik Carrillo for his assistance.
> 
> Mattias Rönnblom (2):
>   eal: add bitset type
>   eal: add high-performance timer facility

^ permalink raw reply	[relevance 0%]

* [RFC 0/2] Add high-performance timer facility
@ 2023-02-28  9:39  3% Mattias Rönnblom
  2023-02-28 16:01  0% ` Morten Brørup
  2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Mattias Rönnblom @ 2023-02-28  9:39 UTC (permalink / raw)
  To: dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark,
	Stefan Sundkvist, Mattias Rönnblom

This patchset is an attempt to introduce a high-performance, highly
scalable timer facility into DPDK.

More specifically, the goals for the htimer library are:

* Efficient handling of a handful up to hundreds of thousands of
  concurrent timers.
* Reduced overhead of adding and canceling timers.
* Provide a service functionally equivalent to that of
  <rte_timer.h>. API/ABI backward compatibility is secondary.

In the author's opinion, there are two main shortcomings with the
current DPDK timer library (i.e., rte_timer.[ch]).

One is the synchronization overhead, where heavy-weight full-barrier
type synchronization is used. rte_timer.c uses per-EAL/lcore skip
lists, but any thread may add or cancel (or otherwise access) timers
managed by another lcore (and thus resides in its timer skip list).

The other is an algorithmic shortcoming, with rte_timer.c's reliance
on a skip list, which, seemingly, is less efficient than certain
alternatives.

This patchset implements a hierarchical timer wheel (HWT, in
rte_htw.c), as per the Varghese and Lauck paper "Hashed and
Hierarchical Timing Wheels: Data Structures for the Efficient
Implementation of a Timer Facility". A HWT is a data structure
purposely design for this task, and used by many operating system
kernel timer facilities.

To further improve the solution described by Varghese and Lauck, a
bitset is placed in front of each of the timer wheel in the HWT,
reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
and expiry processing).

Cycle-efficient scanning and manipulation of these bitsets are crucial
for the HWT's performance.

The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
instance, much like rte_timer.c keeps a per-lcore skip list.

To avoid expensive synchronization overhead for thread-local timer
management, the HWTs are accessed only from the "owning" thread.  Any
interaction any other thread has with a particular lcore's timer
wheel goes over a set of DPDK rings. A side-effect of this design is
that all operations working toward a "remote" HWT must be
asynchronous.

The <rte_htimer.h> API is available only to EAL threads and registered
non-EAL threads.

The htimer API allows the application to supply the current time,
useful in case it already has retrieved this for other purposes,
saving the cost of a rdtsc instruction (or its equivalent).

Relative htimer does not retrieve a new time, but reuse the current
time (as known via/at-the-time of the manage-call), again to shave off
some cycles of overhead.

A semantic improvement compared to the <rte_timer.h> API is that the
htimer library can give a definite answer on the question if the timer
expiry callback was called, after a timer has been canceled.

Below is a performance data from DPDK's 'app/test' micro benchmarks,
using 10k concurrent timers. The benchmarks (test_timer_perf.c and
test_htimer_mgr_perf.c) aren't identical in their structure, but the
numbers give some indication of the difference.

Use case               htimer  timer
------------------------------------
Add timer                 28    253
Cancel timer              10    412
Async add (source lcore)  64
Async add (target lcore)  13

(AMD 5900X CPU. Time in TSC.)

Prototype integration of the htimer library into real, timer-heavy,
applications indicates that htimer may result in significant
application-level performance gains.

The bitset implementation which the HWT implementation depends upon
seemed generic-enough and potentially useful outside the world of
HWTs, to justify being located in the EAL.

This patchset is very much an RFC, and the author is yet to form an
opinion on many important issues.

* If deemed a suitable replacement, should the htimer replace the
  current DPDK timer library in some particular (ABI-breaking)
  release, or should it live side-by-side with the then-legacy
  <rte_timer.h> API? A lot of things in and outside DPDK depend on
  <rte_timer.h>, so coexistence may be required to facilitate a smooth
  transition.

* Should the htimer and htw-related files be colocated with rte_timer.c
  in the timer library?

* Would it be useful for applications using asynchronous cancel to
  have the option of having the timer callback run not only in case of
  timer expiration, but also cancellation (on the target lcore)? The
  timer cb signature would need to include an additional parameter in
  that case.

* Should the rte_htimer be a nested struct, so the htw parts be separated
  from the htimer parts?

* <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
  <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
  be so?

* rte_htimer struct is only supposed to be used by the application to
  give an indication of how much memory it needs to allocate, and is
  its member are not supposed to be directly accessed (w/ the possible
  exception of the owner_lcore_id field). Should there be a dummy
  struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
  function instead, serving the same purpose? Better encapsulation,
  but more inconvenient for applications. Run-time dynamic sizing
  would force application-level dynamic allocations.

* Asynchronous cancellation is a little tricky to use for the
  application (primarily due to timer memory reclamation/race
  issues). Should this functionality be removed?

* Should rte_htimer_mgr_init() also retrieve the current time? If so,
  there should to be a variant which allows the user to specify the
  time (to match rte_htimer_mgr_manage_time()). One pitfall with the
  current proposed API is an application calling rte_htimer_mgr_init()
  and then immediately adding a timer with a relative timeout, in
  which case the current absolute time used is 0, which might be a
  surprise.

* Should libdivide (optionally) be used to avoid the div in the TSC ->
  tick conversion? (Doesn't improve performance on Zen 3, but may
  do on other CPUs.) Consider <rte_reciprocal.h> as well.

* Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
  used for conversion? Very minor performance gains to be found there,
  at least on Zen 3 cores.

* Should it be possible to supply the time in rte_htimer_mgr_add()
  and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
  as TSC? Should it be possible to also use nanoseconds?
  rte_htimer_mgr_manage_time() would need a flags parameter in that
  case.

* Would the event timer adapter be best off using <rte_htw.h>
  directly, or <rte_htimer.h>? In the latter case, there needs to be a
  way to instantiate more HWTs (similar to the "alt" functions of
  <rte_timer.h>)?

* Should the PERIODICAL flag (and the complexity it brings) be
  removed? And leave the application with only single-shot timers, and
  the option to re-add them in the timer callback.

* Should the async result codes and the sync cancel error codes be merged
  into one set of result codes?

* Should the rte_htimer_mgr_async_add() have a flag which allow
  buffering add request messages until rte_htimer_mgr_process() is
  called? Or any manage function. Would reduce ring signaling overhead
  (i.e., burst enqueue operations instead of single-element
  enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
  solving the same "problem" a different way. (The signature of such
  a function would not be pretty.)

* Does the functionality provided by the rte_htimer_mgr_process()
  function match its the use cases? Should there me a more clear
  separation between expiry processing and asynchronous operation
  processing?

* Should the patchset be split into more commits? If so, how?

Thanks to Erik Carrillo for his assistance.

Mattias Rönnblom (2):
  eal: add bitset type
  eal: add high-performance timer facility

 app/test/meson.build             |  10 +-
 app/test/test_bitset.c           | 646 +++++++++++++++++++++++
 app/test/test_htimer_mgr.c       | 674 ++++++++++++++++++++++++
 app/test/test_htimer_mgr_perf.c  | 324 ++++++++++++
 app/test/test_htw.c              | 478 +++++++++++++++++
 app/test/test_htw_perf.c         | 181 +++++++
 doc/api/doxy-api-index.md        |   5 +-
 doc/api/doxy-api.conf.in         |   1 +
 lib/eal/common/meson.build       |   1 +
 lib/eal/common/rte_bitset.c      |  29 +
 lib/eal/include/meson.build      |   1 +
 lib/eal/include/rte_bitset.h     | 878 +++++++++++++++++++++++++++++++
 lib/eal/version.map              |   3 +
 lib/htimer/meson.build           |   7 +
 lib/htimer/rte_htimer.h          |  65 +++
 lib/htimer/rte_htimer_mgr.c      | 488 +++++++++++++++++
 lib/htimer/rte_htimer_mgr.h      | 497 +++++++++++++++++
 lib/htimer/rte_htimer_msg.h      |  44 ++
 lib/htimer/rte_htimer_msg_ring.c |  18 +
 lib/htimer/rte_htimer_msg_ring.h |  49 ++
 lib/htimer/rte_htw.c             | 437 +++++++++++++++
 lib/htimer/rte_htw.h             |  49 ++
 lib/htimer/version.map           |  17 +
 lib/meson.build                  |   1 +
 24 files changed, 4901 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_bitset.c
 create mode 100644 app/test/test_htimer_mgr.c
 create mode 100644 app/test/test_htimer_mgr_perf.c
 create mode 100644 app/test/test_htw.c
 create mode 100644 app/test/test_htw_perf.c
 create mode 100644 lib/eal/common/rte_bitset.c
 create mode 100644 lib/eal/include/rte_bitset.h
 create mode 100644 lib/htimer/meson.build
 create mode 100644 lib/htimer/rte_htimer.h
 create mode 100644 lib/htimer/rte_htimer_mgr.c
 create mode 100644 lib/htimer/rte_htimer_mgr.h
 create mode 100644 lib/htimer/rte_htimer_msg.h
 create mode 100644 lib/htimer/rte_htimer_msg_ring.c
 create mode 100644 lib/htimer/rte_htimer_msg_ring.h
 create mode 100644 lib/htimer/rte_htw.c
 create mode 100644 lib/htimer/rte_htw.h
 create mode 100644 lib/htimer/version.map

-- 
2.34.1

^ permalink raw reply	[relevance 3%]

* Re: [RFC PATCH] drivers/net: fix RSS multi-queue mode check
  @ 2023-02-28  8:23  3%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-28  8:23 UTC (permalink / raw)
  To: lihuisong (C),
	Ajit Khaparde, Somnath Kotur, Rahul Lakkireddy, Simei Su,
	Wenjun Wu, Marcin Wojtas, Michal Krawczyk, Shai Brandes,
	Evgeny Schemeilin, Igor Chauskin, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Junfeng Guo, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Dongdong Liu, Yisen Zhuang, Yuying Zhang,
	Beilei Xing, Jingjing Wu, Qiming Yang, Shijith Thotton,
	Srisivasubramanian Srinivasan, Long Li, Chaoyong He,
	Niklas Söderlund, Jiawen Wu, Rasesh Mody,
	Devendra Singh Rawat, Jerin Jacob, Maciej Czekaj, Jian Wang,
	Jochen Behrens, Andrew Rybchenko
  Cc: Thomas Monjalon, dev, stable

On 2/28/2023 1:24 AM, lihuisong (C) wrote:
> 
> 在 2023/2/27 17:57, Ferruh Yigit 写道:
>> On 2/27/2023 1:34 AM, lihuisong (C) wrote:
>>> 在 2023/2/24 0:04, Ferruh Yigit 写道:
>>>> 'rxmode.mq_mode' is an enum which should be an abstraction over values,
>>>> instead of mask it with 'RTE_ETH_MQ_RX_RSS_FLAG' to detect if RSS is
>>>> supported, directly compare with 'RTE_ETH_MQ_RX_RSS' enum element.
>>>>
>>>> Most of the time only 'RTE_ETH_MQ_RX_RSS' is requested by user, that is
>>>> why output is almost same, but there may be cases driver doesn't
>>>> support
>>>> RSS combinations, like 'RTE_ETH_MQ_RX_VMDQ_DCB_RSS' but that is hidden
>>>> by masking with 'RTE_ETH_MQ_RX_RSS_FLAG'.
>>> Hi Ferruh,
>>>
>>> It seems that this fully changes the usage of the mq_mode.
>>> It will cause RSS, DCB and VMDQ function cannot work well.
>>>
>>> For example,
>>> Both user and driver enable RSS and DCB functions based on xxx_DCB_FLAG
>>> and xxx_RSS_FLAG in rxmode.mq_mode.
>>> If we directly compare with 'RTE_ETH_MQ_RX_RSS' enum element now, how do
>>> we enable RSS+DCB mode?
>>>
>> Hi Huisong,
>>
>> Technically 'RSS+DCB' mode can be set by user setting 'rxmode.mq_mode'
>> to 'RTE_ETH_MQ_RX_DCB_RSS' and PMD checking the same.
> This is not a good way to use.
> Because this has a greate impact for user and PMDs and will add
> cyclomatic complexity of PMD.
>>
>> Overall I think it is not good idea to use enum items as masked values,
> I agree what you do.
> It is better to change rxmode.mq_mode and txmode.mq_mode type from
> 'enum' to 'u32'.
> In this way, PMD code logic don't need to be modified and the impact on
> PMDs and user is minimal.
> What do you think?

If bitmask feature of mq_mode is used and needed, I agree changing
underlying data type cause less disturbance in logic.

But chaning underlying data type has ABI impications, for now I will
drop this patch, thanks for the feedback.

>> but that seems done intentionally in the past:
>> Commit 4bdefaade6d1 ("ethdev: VMDQ enhancements")
> Seems it was.
>>
>> Since this can be in use already, following patch only changes where
>> 'RTE_ETH_RX_OFFLOAD_RSS_HASH' is set, rest of the usage remaining same.
>>
>> And even for 'RTE_ETH_RX_OFFLOAD_RSS_HASH', I think intention was to
>> override this offload config in PMD when explicitly RSS mode is enabled,
>> but I made the set as RFC to get feedback on this. We may keep as it is
>> if some other modes with 'RTE_ETH_MQ_RX_RSS_FLAG' uses this offload.
>>
>>>> Fixes: 73fb89dd6a00 ("drivers/net: fix RSS hash offload flag if no
>>>> RSS")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@amd.com>
>>>>
>>>> ---
>>>>
>>>> There are more usage like "rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG" in
>>>> drivers, not sure to fix all in this commit or not, feedback welcomed.
>>>> ---
>>>>    drivers/net/bnxt/bnxt_ethdev.c       | 2 +-
>>>>    drivers/net/cxgbe/cxgbe_ethdev.c     | 2 +-
>>>>    drivers/net/e1000/igb_ethdev.c       | 4 ++--
>>>>    drivers/net/ena/ena_ethdev.c         | 2 +-
>>>>    drivers/net/enic/enic_ethdev.c       | 2 +-
>>>>    drivers/net/fm10k/fm10k_ethdev.c     | 2 +-
>>>>    drivers/net/gve/gve_ethdev.c         | 2 +-
>>>>    drivers/net/hinic/hinic_pmd_ethdev.c | 2 +-
>>>>    drivers/net/hns3/hns3_ethdev.c       | 2 +-
>>>>    drivers/net/hns3/hns3_ethdev_vf.c    | 2 +-
>>>>    drivers/net/i40e/i40e_ethdev.c       | 2 +-
>>>>    drivers/net/iavf/iavf_ethdev.c       | 2 +-
>>>>    drivers/net/ice/ice_dcf_ethdev.c     | 2 +-
>>>>    drivers/net/ice/ice_ethdev.c         | 2 +-
>>>>    drivers/net/igc/igc_ethdev.c         | 2 +-
>>>>    drivers/net/ixgbe/ixgbe_ethdev.c     | 4 ++--
>>>>    drivers/net/liquidio/lio_ethdev.c    | 2 +-
>>>>    drivers/net/mana/mana.c              | 2 +-
>>>>    drivers/net/netvsc/hn_ethdev.c       | 2 +-
>>>>    drivers/net/nfp/nfp_common.c         | 2 +-
>>>>    drivers/net/ngbe/ngbe_ethdev.c       | 2 +-
>>>>    drivers/net/qede/qede_ethdev.c       | 2 +-
>>>>    drivers/net/thunderx/nicvf_ethdev.c  | 2 +-
>>>>    drivers/net/txgbe/txgbe_ethdev.c     | 2 +-
>>>>    drivers/net/txgbe/txgbe_ethdev_vf.c  | 2 +-
>>>>    drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +-
>>>>    26 files changed, 28 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bnxt/bnxt_ethdev.c
>>>> b/drivers/net/bnxt/bnxt_ethdev.c
>>>> index 753e86b4b2af..14c0d5f8c72b 100644
>>>> --- a/drivers/net/bnxt/bnxt_ethdev.c
>>>> +++ b/drivers/net/bnxt/bnxt_ethdev.c
>>>> @@ -1143,7 +1143,7 @@ static int bnxt_dev_configure_op(struct
>>>> rte_eth_dev *eth_dev)
>>>>        bp->rx_cp_nr_rings = bp->rx_nr_rings;
>>>>        bp->tx_cp_nr_rings = bp->tx_nr_rings;
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rx_offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>        eth_dev->data->dev_conf.rxmode.offloads = rx_offloads;
>>>>    diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> b/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> index 45bbeaef0ceb..0e9ccc0587ba 100644
>>>> --- a/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> +++ b/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> @@ -440,7 +440,7 @@ int cxgbe_dev_configure(struct rte_eth_dev
>>>> *eth_dev)
>>>>          CXGBE_FUNC_TRACE();
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/e1000/igb_ethdev.c
>>>> b/drivers/net/e1000/igb_ethdev.c
>>>> index 8858f975f8cc..8e6b43c2ff2d 100644
>>>> --- a/drivers/net/e1000/igb_ethdev.c
>>>> +++ b/drivers/net/e1000/igb_ethdev.c
>>>> @@ -1146,7 +1146,7 @@ eth_igb_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> @@ -3255,7 +3255,7 @@ igbvf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/ena/ena_ethdev.c
>>>> b/drivers/net/ena/ena_ethdev.c
>>>> index efcb163027c8..6929d7066fbd 100644
>>>> --- a/drivers/net/ena/ena_ethdev.c
>>>> +++ b/drivers/net/ena/ena_ethdev.c
>>>> @@ -2307,7 +2307,7 @@ static int ena_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>          adapter->state = ENA_ADAPTER_STATE_CONFIG;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>        dev->data->dev_conf.txmode.offloads |=
>>>> RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
>>>>    diff --git a/drivers/net/enic/enic_ethdev.c
>>>> b/drivers/net/enic/enic_ethdev.c
>>>> index cdf091559196..f3a7bc161408 100644
>>>> --- a/drivers/net/enic/enic_ethdev.c
>>>> +++ b/drivers/net/enic/enic_ethdev.c
>>>> @@ -323,7 +323,7 @@ static int enicpmd_dev_configure(struct
>>>> rte_eth_dev *eth_dev)
>>>>            return ret;
>>>>        }
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/fm10k/fm10k_ethdev.c
>>>> b/drivers/net/fm10k/fm10k_ethdev.c
>>>> index 8b83063f0a2d..49d7849ba5ea 100644
>>>> --- a/drivers/net/fm10k/fm10k_ethdev.c
>>>> +++ b/drivers/net/fm10k/fm10k_ethdev.c
>>>> @@ -450,7 +450,7 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> diff --git a/drivers/net/gve/gve_ethdev.c
>>>> b/drivers/net/gve/gve_ethdev.c
>>>> index cf28a4a3b710..f34755a369fb 100644
>>>> --- a/drivers/net/gve/gve_ethdev.c
>>>> +++ b/drivers/net/gve/gve_ethdev.c
>>>> @@ -92,7 +92,7 @@ gve_dev_configure(struct rte_eth_dev *dev)
>>>>    {
>>>>        struct gve_priv *priv = dev->data->dev_private;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->dev_conf.rxmode.offloads &
>>>> RTE_ETH_RX_OFFLOAD_TCP_LRO)
>>>> diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> b/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> index 7aa5e7d8e929..872ee97b1e97 100644
>>>> --- a/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> +++ b/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> @@ -311,7 +311,7 @@ static int hinic_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>            return -EINVAL;
>>>>        }
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* mtu size is 256~9600 */
>>>> diff --git a/drivers/net/hns3/hns3_ethdev.c
>>>> b/drivers/net/hns3/hns3_ethdev.c
>>>> index 6babf67fcec2..fd3e499a3d38 100644
>>>> --- a/drivers/net/hns3/hns3_ethdev.c
>>>> +++ b/drivers/net/hns3/hns3_ethdev.c
>>>> @@ -2016,7 +2016,7 @@ hns3_dev_configure(struct rte_eth_dev *dev)
>>>>                goto cfg_err;
>>>>        }
>>>>    -    if ((uint32_t)mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) {
>>>> +    if (mq_mode == RTE_ETH_MQ_RX_RSS) {
>>>>            conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>            rss_conf = conf->rx_adv_conf.rss_conf;
>>>>            ret = hns3_dev_rss_hash_update(dev, &rss_conf);
>>>> diff --git a/drivers/net/hns3/hns3_ethdev_vf.c
>>>> b/drivers/net/hns3/hns3_ethdev_vf.c
>>>> index d051a1357b9f..00eb22d05558 100644
>>>> --- a/drivers/net/hns3/hns3_ethdev_vf.c
>>>> +++ b/drivers/net/hns3/hns3_ethdev_vf.c
>>>> @@ -494,7 +494,7 @@ hns3vf_dev_configure(struct rte_eth_dev *dev)
>>>>        }
>>>>          /* When RSS is not configured, redirect the packet queue 0 */
>>>> -    if ((uint32_t)mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) {
>>>> +    if (mq_mode == RTE_ETH_MQ_RX_RSS) {
>>>>            conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>            rss_conf = conf->rx_adv_conf.rss_conf;
>>>>            ret = hns3_dev_rss_hash_update(dev, &rss_conf);
>>>> diff --git a/drivers/net/i40e/i40e_ethdev.c
>>>> b/drivers/net/i40e/i40e_ethdev.c
>>>> index 7726a89d99fb..3c3dbc285c96 100644
>>>> --- a/drivers/net/i40e/i40e_ethdev.c
>>>> +++ b/drivers/net/i40e/i40e_ethdev.c
>>>> @@ -1884,7 +1884,7 @@ i40e_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->tx_simple_allowed = true;
>>>>        ad->tx_vec_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          ret = i40e_dev_init_vlan(dev);
>>>> diff --git a/drivers/net/iavf/iavf_ethdev.c
>>>> b/drivers/net/iavf/iavf_ethdev.c
>>>> index 3196210f2c1d..39860c08b606 100644
>>>> --- a/drivers/net/iavf/iavf_ethdev.c
>>>> +++ b/drivers/net/iavf/iavf_ethdev.c
>>>> @@ -638,7 +638,7 @@ iavf_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_vec_allowed = true;
>>>>        ad->tx_vec_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* Large VF setting */
>>>> diff --git a/drivers/net/ice/ice_dcf_ethdev.c
>>>> b/drivers/net/ice/ice_dcf_ethdev.c
>>>> index dcbf2af5b039..f61a30716e5e 100644
>>>> --- a/drivers/net/ice/ice_dcf_ethdev.c
>>>> +++ b/drivers/net/ice/ice_dcf_ethdev.c
>>>> @@ -711,7 +711,7 @@ ice_dcf_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_bulk_alloc_allowed = true;
>>>>        ad->tx_simple_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          return 0;
>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>> b/drivers/net/ice/ice_ethdev.c
>>>> index 0d011bbffa77..96595fd7afaf 100644
>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>> @@ -3403,7 +3403,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_bulk_alloc_allowed = true;
>>>>        ad->tx_simple_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->nb_rx_queues) {
>>>> diff --git a/drivers/net/igc/igc_ethdev.c
>>>> b/drivers/net/igc/igc_ethdev.c
>>>> index fab2ab6d1ce7..49f2b3738b84 100644
>>>> --- a/drivers/net/igc/igc_ethdev.c
>>>> +++ b/drivers/net/igc/igc_ethdev.c
>>>> @@ -375,7 +375,7 @@ eth_igc_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          ret  = igc_check_mq_mode(dev);
>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> index 88118bc30560..328ccf918e86 100644
>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> @@ -2431,7 +2431,7 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> @@ -5321,7 +5321,7 @@ ixgbevf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/liquidio/lio_ethdev.c
>>>> b/drivers/net/liquidio/lio_ethdev.c
>>>> index ebcfbb1a5c0f..07fbaeda1ee6 100644
>>>> --- a/drivers/net/liquidio/lio_ethdev.c
>>>> +++ b/drivers/net/liquidio/lio_ethdev.c
>>>> @@ -1722,7 +1722,7 @@ lio_dev_configure(struct rte_eth_dev *eth_dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
>>>> index 43221e743e87..76de691a8252 100644
>>>> --- a/drivers/net/mana/mana.c
>>>> +++ b/drivers/net/mana/mana.c
>>>> @@ -78,7 +78,7 @@ mana_dev_configure(struct rte_eth_dev *dev)
>>>>        struct mana_priv *priv = dev->data->dev_private;
>>>>        struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
>>>>    -    if (dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev_conf->rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev_conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) {
>>>> diff --git a/drivers/net/netvsc/hn_ethdev.c
>>>> b/drivers/net/netvsc/hn_ethdev.c
>>>> index d0bbc0a4c0c0..4950b061799c 100644
>>>> --- a/drivers/net/netvsc/hn_ethdev.c
>>>> +++ b/drivers/net/netvsc/hn_ethdev.c
>>>> @@ -721,7 +721,7 @@ static int hn_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev_conf->rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev_conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          unsupported = txmode->offloads & ~HN_TX_OFFLOAD_CAPS;
>>>> diff --git a/drivers/net/nfp/nfp_common.c
>>>> b/drivers/net/nfp/nfp_common.c
>>>> index 907777a9e44d..a774fad3fba2 100644
>>>> --- a/drivers/net/nfp/nfp_common.c
>>>> +++ b/drivers/net/nfp/nfp_common.c
>>>> @@ -161,7 +161,7 @@ nfp_net_configure(struct rte_eth_dev *dev)
>>>>        rxmode = &dev_conf->rxmode;
>>>>        txmode = &dev_conf->txmode;
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* Checking TX mode */
>>>> diff --git a/drivers/net/ngbe/ngbe_ethdev.c
>>>> b/drivers/net/ngbe/ngbe_ethdev.c
>>>> index c32d954769b0..5b53781c4aaf 100644
>>>> --- a/drivers/net/ngbe/ngbe_ethdev.c
>>>> +++ b/drivers/net/ngbe/ngbe_ethdev.c
>>>> @@ -918,7 +918,7 @@ ngbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* set flag to update link status after init */
>>>> diff --git a/drivers/net/qede/qede_ethdev.c
>>>> b/drivers/net/qede/qede_ethdev.c
>>>> index a4923670d6ba..11ddd8abf16a 100644
>>>> --- a/drivers/net/qede/qede_ethdev.c
>>>> +++ b/drivers/net/qede/qede_ethdev.c
>>>> @@ -1272,7 +1272,7 @@ static int qede_dev_configure(struct rte_eth_dev
>>>> *eth_dev)
>>>>          PMD_INIT_FUNC_TRACE(edev);
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* We need to have min 1 RX queue.There is no min check in
>>>> diff --git a/drivers/net/thunderx/nicvf_ethdev.c
>>>> b/drivers/net/thunderx/nicvf_ethdev.c
>>>> index ab1e714d9767..b9cd09332510 100644
>>>> --- a/drivers/net/thunderx/nicvf_ethdev.c
>>>> +++ b/drivers/net/thunderx/nicvf_ethdev.c
>>>> @@ -1984,7 +1984,7 @@ nicvf_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (!rte_eal_has_hugepages()) {
>>>> diff --git a/drivers/net/txgbe/txgbe_ethdev.c
>>>> b/drivers/net/txgbe/txgbe_ethdev.c
>>>> index a502618bc5a2..08ad5a087e23 100644
>>>> --- a/drivers/net/txgbe/txgbe_ethdev.c
>>>> +++ b/drivers/net/txgbe/txgbe_ethdev.c
>>>> @@ -1508,7 +1508,7 @@ txgbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> b/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> index 3b1f7c913b7b..02a59fc696e5 100644
>>>> --- a/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> +++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> @@ -577,7 +577,7 @@ txgbevf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> index fd946dec5c80..8efde46ae0ad 100644
>>>> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> @@ -531,7 +531,7 @@ vmxnet3_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (!VMXNET3_VERSION_GE_6(hw)) {
>> .


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 01/21] net/cpfl: support device initialization
  @ 2023-02-27 23:38  3%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-27 23:38 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko, Jerin Jacob Kollanukkaran,
	Qi Z Zhang, David Marchand
  Cc: dev, Mingxia Liu, yuying.zhang, beilei.xing, techboard

On 2/27/2023 3:45 PM, Thomas Monjalon wrote:
> 27/02/2023 14:46, Ferruh Yigit:
>> On 2/16/2023 12:29 AM, Mingxia Liu wrote:
>>> +static int
>>> +cpfl_dev_configure(struct rte_eth_dev *dev)
>>> +{
>>> +	struct rte_eth_conf *conf = &dev->data->dev_conf;
>>> +
>>> +	if (conf->link_speeds & RTE_ETH_LINK_SPEED_FIXED) {
>>> +		PMD_INIT_LOG(ERR, "Setting link speed is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->txmode.mq_mode != RTE_ETH_MQ_TX_NONE) {
>>> +		PMD_INIT_LOG(ERR, "Multi-queue TX mode %d is not supported",
>>> +			     conf->txmode.mq_mode);
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->lpbk_mode != 0) {
>>> +		PMD_INIT_LOG(ERR, "Loopback operation mode %d is not supported",
>>> +			     conf->lpbk_mode);
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->dcb_capability_en != 0) {
>>> +		PMD_INIT_LOG(ERR, "Priority Flow Control(PFC) if not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.lsc != 0) {
>>> +		PMD_INIT_LOG(ERR, "LSC interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.rxq != 0) {
>>> +		PMD_INIT_LOG(ERR, "RXQ interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.rmv != 0) {
>>> +		PMD_INIT_LOG(ERR, "RMV interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	return 0;
>>
>> This is '.dev_configure()' dev ops of a driver, there is nothing wrong
>> with the function but it is a good example to highlight a point.
>>
>>
>> 'rte_eth_dev_configure()' can fail from various reasons, what can an
>> application do in this case?
>> It is not clear why configuration failed, there is no way to figure out
>> failed config option dynamically.
> 
> There are some capabilities to read before calling "configure".
> 

Yes, but there are some PMD specific cases as well, like above
SPEED_FIXED is not supported. How an app can manage this?

Mainly "struct rte_eth_dev_info" is used for capabilities (although it
is a mixed bag), that is not symmetric with config/setup functions, I
mean for a config/setup function there is no clear matching capability
struct/function.

>> Application developer can read the log and find out what caused the
>> failure, but what can do next? Put a conditional check for the
>> particular device, assuming application supports multiple devices,
>> before configuration?
> 
> Which failures cannot be guessed with capability flags?
> 

At least for above sample as far as I can see some capabilities are missing:
- txmode.mq_mode
- rxmode.mq_mode
- lpbk_mode
- intr_conf.rxq

We can go through all list to detect gaps if we plan to have an action.

>> I think we need better error value, to help application detect what went
>> wrong and adapt dynamically, perhaps a bitmask of errors one per each
>> config option, what do you think?
> 
> I am not sure we can change such an old API.
> 

Yes that is hard, but if we keep the return value negative, that can
still be backward compatible.

Or API can keep the interface same but set a global 'reason' variable,
similar to 'errno', so optionally new application code can get it with a
new API and investigate it.

>> And I think this is another reason why we should not make a single API
>> too overloaded and complex.
> 
> Right, and I would support a work to have some of those "configure" features
> available as small functions.
> 

If there is enough appetite we can put something to deprecation notice
for next ABI release.


^ permalink raw reply	[relevance 3%]

* RE: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
  2023-02-21  0:48  3%                     ` Konstantin Ananyev
@ 2023-02-27  8:12  0%                       ` Tomasz Duszynski
  0 siblings, 0 replies; 200+ results
From: Tomasz Duszynski @ 2023-02-27  8:12 UTC (permalink / raw)
  To: Konstantin Ananyev, Konstantin Ananyev, dev



>-----Original Message-----
>From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>Sent: Tuesday, February 21, 2023 1:48 AM
>To: Tomasz Duszynski <tduszynski@marvell.com>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
>dev@dpdk.org
>Subject: Re: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
>
>
>>>>>>>>>> diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file
>>>>>>>>>> mode
>>>>>>>>>> 100644 index 0000000000..6b664c3336
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/lib/pmu/rte_pmu.h
>>>>>>>>>> @@ -0,0 +1,212 @@
>>>>>>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>>>>>>> + * Copyright(c) 2023 Marvell  */
>>>>>>>>>> +
>>>>>>>>>> +#ifndef _RTE_PMU_H_
>>>>>>>>>> +#define _RTE_PMU_H_
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @file
>>>>>>>>>> + *
>>>>>>>>>> + * PMU event tracing operations
>>>>>>>>>> + *
>>>>>>>>>> + * This file defines generic API and types necessary to setup
>>>>>>>>>> +PMU and
>>>>>>>>>> + * read selected counters in runtime.
>>>>>>>>>> + */
>>>>>>>>>> +
>>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>>> +extern "C" {
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>> +#include <linux/perf_event.h>
>>>>>>>>>> +
>>>>>>>>>> +#include <rte_atomic.h>
>>>>>>>>>> +#include <rte_branch_prediction.h> #include <rte_common.h>
>>>>>>>>>> +#include <rte_compat.h> #include <rte_spinlock.h>
>>>>>>>>>> +
>>>>>>>>>> +/** Maximum number of events in a group */ #define
>>>>>>>>>> +MAX_NUM_GROUP_EVENTS 8
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A structure describing a group of events.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu_event_group {
>>>>>>>>>> +	struct perf_event_mmap_page
>>>>>>>>>> +*mmap_pages[MAX_NUM_GROUP_EVENTS];
>>>>>>>>>> +/**< array of user pages
>>>>>>> */
>>>>>>>>>> +	int fds[MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */
>>>>>>>>>> +	bool enabled; /**< true if group was enabled on particular lcore */
>>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ }
>>>>>>>>>> +__rte_cache_aligned;
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A structure describing an event.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu_event {
>>>>>>>>>> +	char *name; /**< name of an event */
>>>>>>>>>> +	unsigned int index; /**< event index into fds/mmap_pages */
>>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event) next; /**< list entry */ };
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A PMU state container.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu {
>>>>>>>>>> +	char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */
>>>>>>>>>> +	rte_spinlock_t lock; /**< serialize access to event group list */
>>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event_group) event_group_list; /**< list of event groups */
>>>>>>>>>> +	unsigned int num_group_events; /**< number of events in a group */
>>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */
>>>>>>>>>> +	unsigned int initialized; /**< initialization counter */ };
>>>>>>>>>> +
>>>>>>>>>> +/** lcore event group */
>>>>>>>>>> +RTE_DECLARE_PER_LCORE(struct rte_pmu_event_group,
>>>>>>>>>> +_event_group);
>>>>>>>>>> +
>>>>>>>>>> +/** PMU state container */
>>>>>>>>>> +extern struct rte_pmu rte_pmu;
>>>>>>>>>> +
>>>>>>>>>> +/** Each architecture supporting PMU needs to provide its own
>>>>>>>>>> +version */ #ifndef rte_pmu_pmc_read #define
>>>>>>>>>> +rte_pmu_pmc_read(index) ({ 0; }) #endif
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Read PMU counter.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @param pc
>>>>>>>>>> + *   Pointer to the mmapped user page.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Counter value read from hardware.
>>>>>>>>>> + */
>>>>>>>>>> +static __rte_always_inline uint64_t
>>>>>>>>>> +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) {
>>>>>>>>>> +	uint64_t width, offset;
>>>>>>>>>> +	uint32_t seq, index;
>>>>>>>>>> +	int64_t pmc;
>>>>>>>>>> +
>>>>>>>>>> +	for (;;) {
>>>>>>>>>> +		seq = pc->lock;
>>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>>
>>>>>>>>> Are you sure that compiler_barrier() is enough here?
>>>>>>>>> On some archs CPU itself has freedom to re-order reads.
>>>>>>>>> Or I am missing something obvious here?
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's a matter of not keeping old stuff cached in registers and
>>>>>>>> making sure that we have two reads of lock. CPU reordering won't
>>>>>>>> do any harm here.
>>>>>>>
>>>>>>> Sorry, I didn't get you here:
>>>>>>> Suppose CPU will re-order reads and will read lock *after* index or offset value.
>>>>>>> Wouldn't it mean that in that case index and/or offset can contain old/invalid values?
>>>>>>>
>>>>>>
>>>>>> This number is just an indicator whether kernel did change something or not.
>>>>>
>>>>> You are talking about pc->lock, right?
>>>>> Yes, I do understand that it is sort of seqlock.
>>>>> That's why I am puzzled why we do not care about possible cpu read-reordering.
>>>>> Manual for perf_event_open() also has a code snippet with compiler barrier only...
>>>>>
>>>>>> If cpu reordering will come into play then this will not change
>>>>>> anything from pov of this
>>> loop.
>>>>>> All we want is fresh data when needed and no involvement of
>>>>>> compiler when it comes to reordering code.
>>>>>
>>>>> Ok, can you probably explain to me why the following could not happen:
>>>>> T0:
>>>>> pc->seqlock==0; pc->index==I1; pc->offset==O1;
>>>>> T1:
>>>>>       cpu #0 read pmu (due to cpu read reorder, we get index value before seqlock):
>>>>>        index=pc->index;  //index==I1;
>>>>> T2:
>>>>>       cpu #1 kernel vent_update_userpage:
>>>>>       pc->lock++; // pc->lock==1
>>>>>       pc->index=I2;
>>>>>       pc->offset=O2;
>>>>>       ...
>>>>>       pc->lock++; //pc->lock==2
>>>>> T3:
>>>>>       cpu #0 continue with read pmu:
>>>>>       seq=pc->lock; //seq == 2
>>>>>        offset=pc->offset; // offset == O2
>>>>>        ....
>>>>>        pmc = rte_pmu_pmc_read(index - 1);  // Note that we read at I1, not I2
>>>>>        offset += pmc; //offset == O2 + pmcread(I1-1);
>>>>>        if (pc->lock == seq) // they are equal, return
>>>>>              return offset;
>>>>>
>>>>> Or, it can happen, but by some reason we don't care much?
>>>>>
>>>>
>>>> This code does self-monitoring and user page (whole group actually)
>>>> is per thread running on current cpu. Hence I am not sure what are
>>>> you trying to prove with that
>>> example.
>>>
>>> I am not trying to prove anything so far.
>>> I am asking is such situation possible or not, and if not, why?
>>> My current understanding (possibly wrong) is that after you mmaped
>>> these pages, kernel still can asynchronously update them.
>>> So, when reading the data from these pages you have to check 'lock'
>>> value before and after accessing other data.
>>> If so, why possible cpu read-reordering doesn't matter?
>>>
>>
>> Look. I'll reiterate that.
>>
>> 1. That user page/group/PMU config is per process. Other processes do not access that.
>
>Ok, that's clear.
>
>
>>     All this happens on the very same CPU where current thread is running.
>
>Ok... but can't this page be updated by kernel thread running simultaneously on different CPU?
>

I already pointed out that event/counter configuration is bound to current cpu. How can possibly
other cpu update that configuration? This cannot work. 


If you think that there's some problem with the code (or is simply broken on your setup) and logic 
has obvious flaw and you can provide meaningful evidence of that then I'd be more than happy to 
apply that fix. Otherwise that discussion will get us nowhere. 

>
>> 2. Suppose you've already read seq. Now for some reason kernel updates data in page seq was read
>from.
>> 3. Kernel will enter critical section during update. seq changes along with other data without
>app knowing about it.
>>     If you want nitty gritty details consult kernel sources.
>
>Look, I don't have to beg you to answer these questions.
>In fact, I expect library author to document all such narrow things
>clearly either in in PG, or in source code comments (ideally in both).
>If not, then from my perspective the patch is not ready stage and
>shouldn't be accepted.
>I don't know is compiler-barrier is enough here or not, but I think it
>is definitely worth a clear explanation in the docs.
>I suppose it wouldn't be only me who will get confused here.
>So please take an effort and document it clearly why you believe there
>is no race-condition.
>
>> 4. app resumes and has some stale data but *WILL* read new seq. Code loops again because values
>do not match.
>
>If the kernel will always execute update for this page in the same
>thread context, then yes, - user code will always note the difference
>after resume.
>But why it can't happen that your user-thread reads this page on one
>CPU, while some kernel code on other CPU updates it simultaneously?
>
>
>> 5. Otherwise seq values match and data is valid.
>>
>>> Also there was another question below, which you probably  missed, so I copied it here:
>>> Another question - do we really need  to have __rte_pmu_read_userpage() and rte_pmu_read() as
>>> static inline functions in public header?
>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>> definitions also public.
>>>
>>
>> These functions need to be inlined otherwise performance takes a hit.
>
>I understand that perfomance might be affected, but how big is hit?
>I expect actual PMU read will not be free anyway, right?
>If the diff is small, might be it is worth to go for such change,
>removing unneeded structures from public headers would help a lot in
>future in terms of ABI/API stability.
>
>
>
>>>>
>>>>>>>>
>>>>>>>>>> +		index = pc->index;
>>>>>>>>>> +		offset = pc->offset;
>>>>>>>>>> +		width = pc->pmc_width;
>>>>>>>>>> +
>>>>>>>>>> +		/* index set to 0 means that particular counter cannot be used */
>>>>>>>>>> +		if (likely(pc->cap_user_rdpmc && index)) {
>>>>>>>>>> +			pmc = rte_pmu_pmc_read(index - 1);
>>>>>>>>>> +			pmc <<= 64 - width;
>>>>>>>>>> +			pmc >>= 64 - width;
>>>>>>>>>> +			offset += pmc;
>>>>>>>>>> +		}
>>>>>>>>>> +
>>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>>> +
>>>>>>>>>> +		if (likely(pc->lock == seq))
>>>>>>>>>> +			return offset;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Enable group of events on the calling lcore.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @return
>>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +__rte_pmu_enable_group(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Initialize PMU library.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @return
>>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +rte_pmu_init(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Finalize PMU library. This should be called after PMU
>>>>>>>>>> +counters are no longer being
>>>>> read.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +void
>>>>>>>>>> +rte_pmu_fini(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Add event to the group of enabled events.
>>>>>>>>>> + *
>>>>>>>>>> + * @param name
>>>>>>>>>> + *   Name of an event listed under /sys/bus/event_source/devices/pmu/events.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Event index in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +rte_pmu_add_event(const char *name);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Read hardware counter configured to count occurrences of an event.
>>>>>>>>>> + *
>>>>>>>>>> + * @param index
>>>>>>>>>> + *   Index of an event to be read.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Event value read from register. In case of errors or lack of support
>>>>>>>>>> + *   0 is returned. In other words, stream of zeros in a trace file
>>>>>>>>>> + *   indicates problem with reading particular PMU event register.
>>>>>>>>>> + */
>>>>>
>>>>> Another question - do we really need  to have
>>>>> __rte_pmu_read_userpage() and rte_pmu_read() as static inline functions in public header?
>>>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>>>> definitions also public.
>>>>>
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +static __rte_always_inline uint64_t rte_pmu_read(unsigned
>>>>>>>>>> +int
>>>>>>>>>> +index) {
>>>>>>>>>> +	struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
>>>>>>>>>> +	int ret;
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(!rte_pmu.initialized))
>>>>>>>>>> +		return 0;
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(!group->enabled)) {
>>>>>>>>>> +		ret = __rte_pmu_enable_group();
>>>>>>>>>> +		if (ret)
>>>>>>>>>> +			return 0;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(index >= rte_pmu.num_group_events))
>>>>>>>>>> +		return 0;
>>>>>>>>>> +
>>>>>>>>>> +	return __rte_pmu_read_userpage(group->mmap_pages[index]);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>>> +}
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-24  6:31  0%     ` Yan, Zhirun
@ 2023-02-26 22:23  0%       ` Jerin Jacob
  2023-03-02  8:38  0%         ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-26 22:23 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 20, 2023 9:51 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new get/set APIs to configure graph worker model which is used to
> > > determine which model will be chosen.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > >  lib/graph/version.map               |  3 ++
> > >  3 files changed, 67 insertions(+)
> > >
> > > diff --git a/lib/graph/rte_graph_worker.h
> > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > > --- a/lib/graph/rte_graph_worker.h
> > > +++ b/lib/graph/rte_graph_worker.h
> > > @@ -1,5 +1,56 @@
> > >  #include "rte_graph_model_rtc.h"
> > >
> > > +static enum rte_graph_worker_model worker_model =
> > > +RTE_GRAPH_MODEL_DEFAULT;
> >
> > This will break the multiprocess.
>
> Thanks. I will use TLS for per-thread local storage.

If it needs to be used from secondary process, then it needs to be from memzone.



>
> >
> > > +
> > > +/** Graph worker models */
> > > +enum rte_graph_worker_model {
> > > +#define WORKER_MODEL_DEFAULT "default"
> >
> > Why need strings?
> > Also, every symbol in a public header file should start with RTE_ to avoid
> > namespace conflict.
>
> It was used to config the model in app. I can put the string into example.

OK

>
> >
> > > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > > +#define WORKER_MODEL_RTC "rtc"
> > > +       RTE_GRAPH_MODEL_RTC,
> >
> > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> > itself.
> Yes, will do in next version.
>
> >
> > > +#define WORKER_MODEL_GENERIC "generic"
> >
> > Generic is a very overloaded term. Use pipeline here i.e
> > RTE_GRAPH_MODEL_PIPELINE
>
> Actually, it's not a purely pipeline mode. I prefer to change to hybrid.

Hybrid is very overloaded term, and it will be confusing (considering
there will be new models in future).
Please pick a word that really express the model working.

> >
> >
> > > +       RTE_GRAPH_MODEL_GENERIC,
> > > +       RTE_GRAPH_MODEL_MAX,
> >
> > No need for MAX, it will break the ABI for future. See other subsystem such as
> > cryptodev.
>
> Thanks, I will change it.
> >
> > > +};
> >
> > >

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  2023-02-23 16:57  0%     ` Mike Pattrick
@ 2023-02-24 15:05  4%       ` Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2023-02-24 15:05 UTC (permalink / raw)
  To: Mike Pattrick; +Cc: Maxime Coquelin, dev, david.marchand, chenbo.xia

[-- Attachment #1: Type: text/plain, Size: 16088 bytes --]

UNH CI reported an ABI failure for this patch which did not report due to a
bug on our end, so I'm manually reporting it now. I see Maxime you already
predicted the issue though!

*07:58:32*  1 function with some indirect sub-type change:*07:58:32*
*07:58:32*    [C] 'function int rte_vhost_get_mem_table(int,
rte_vhost_memory**)' at vhost.c:922:1 has some indirect sub-type
changes:*07:58:32*      parameter 2 of type 'rte_vhost_memory**' has
sub-type changes:*07:58:32*        in pointed to type
'rte_vhost_memory*':*07:58:32*          in pointed to type 'struct
rte_vhost_memory' at rte_vhost.h:145:1:*07:58:32*            type size
hasn't changed*07:58:32*            1 data member change:*07:58:32*
          type of 'rte_vhost_mem_region regions[]' changed:*07:58:32*
              array element type 'struct rte_vhost_mem_region'
changed:*07:58:32*                  type size changed from 448 to 512
(in bits)*07:58:32*                  1 data member
insertion:*07:58:32*                    'uint64_t alignment', at
offset 448 (in bits) at rte_vhost.h:139:1*07:58:32*
type size hasn't changed*07:58:32*  *07:58:32*  Error: ABI issue
reported for abidiff --suppr dpdk/devtools/libabigail.abignore
--no-added-syms --headers-dir1 reference/include --headers-dir2
build_install/include reference/dump/librte_vhost.dump
build_install/dump/librte_vhost.dump*07:58:32*  ABIDIFF_ABI_CHANGE,
this change requires a review (abidiff flagged this as a potential
issue).


On Thu, Feb 23, 2023 at 11:57 AM Mike Pattrick <mkp@redhat.com> wrote:

> On Thu, Feb 23, 2023 at 11:12 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
> >
> > Hi Mike,
> >
> > Thanks for  looking into this issue.
> >
> > On 2/23/23 05:35, Mike Pattrick wrote:
> > > The arguments passed to madvise should be aligned to the alignment of
> > > the backing memory. Now we keep track of each regions alignment and use
> > > then when setting coredump preferences. To facilitate this, a new
> member
> > > was added to rte_vhost_mem_region. A new function was added to easily
> > > translate memory address back to region alignment. Unneeded calls to
> > > madvise were reduced, as the cache removal case should already be
> > > covered by the cache insertion case. The previously inline function
> > > mem_set_dump was removed from a header file and made not inline.
> > >
> > > Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> > >
> > > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > > ---
> > > Since v1:
> > >   - Corrected a cast for 32bit compiles
> > > ---
> > >   lib/vhost/iotlb.c      |  9 +++---
> > >   lib/vhost/rte_vhost.h  |  1 +
> > >   lib/vhost/vhost.h      | 12 ++------
> > >   lib/vhost/vhost_user.c | 63
> +++++++++++++++++++++++++++++++++++-------
> > >   4 files changed, 60 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> > > index a0b8fd7302..5293507b63 100644
> > > --- a/lib/vhost/iotlb.c
> > > +++ b/lib/vhost/iotlb.c
> > > @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct
> vhost_virtqueue *vq)
> > >       rte_rwlock_write_lock(&vq->iotlb_lock);
> > >
> > >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > > -             mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
> true);
> >
> > Hmm, it should have been called with enable=false here since we are
> > removing the entry from the IOTLB cache. It should be kept in order to
> > "DONTDUMP" pages evicted from the cache.
>
> Here I was thinking that if we add an entry and then remove a
> different entry, they could be in the same page. But on I should have
> kept an enable=false in remove_all().
>
> And now that I think about it again, I could just check if there are
> any active cache entries in the page on every evict/remove, they're
> sorted so that should be an easy check. Unless there are any
> objections I'll go forward with that.
>
> >
> > >               TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >               vhost_user_iotlb_pool_put(vq, node);
> > >       }
> > > @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct
> vhost_virtqueue *vq)
> > >
> > >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > >               if (!entry_idx) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> >
> > Same here.
> >
> > >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >                       vhost_user_iotlb_pool_put(vq, node);
> > >                       vq->iotlb_cache_nr--;
> > > @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net
> *dev, struct vhost_virtqueue *vq
> > >                       vhost_user_iotlb_pool_put(vq, new_node);
> > >                       goto unlock;
> > >               } else if (node->iova > new_node->iova) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> > > +                     mem_set_dump((void *)(uintptr_t)new_node->uaddr,
> new_node->size, true,
> > > +                             hua_to_alignment(dev->mem, (void
> *)(uintptr_t)node->uaddr));
> > >                       TAILQ_INSERT_BEFORE(node, new_node, next);
> > >                       vq->iotlb_cache_nr++;
> > >                       goto unlock;
> > >               }
> > >       }
> > >
> > > -     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > > +     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size,
> true,
> > > +             hua_to_alignment(dev->mem, (void
> *)(uintptr_t)new_node->uaddr));
> > >       TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
> > >       vq->iotlb_cache_nr++;
> > >
> > > @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct
> vhost_virtqueue *vq,
> > >                       break;
> > >
> > >               if (iova < node->iova + node->size) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> > >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >                       vhost_user_iotlb_pool_put(vq, node);
> > >                       vq->iotlb_cache_nr--;
> > > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > > index a395843fe9..c5c97ea67e 100644
> > > --- a/lib/vhost/rte_vhost.h
> > > +++ b/lib/vhost/rte_vhost.h
> > > @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
> > >       void     *mmap_addr;
> > >       uint64_t mmap_size;
> > >       int fd;
> > > +     uint64_t alignment;
> >
> > This is not possible to do this as it breaks the ABI.
> > You have to store the information somewhere else, or simply call
> > get_blk_size() in hua_to_alignment() since the fd is not closed.
> >
>
> Sorry about that! You're right, checking the fd per operation should
> be easy enough.
>
> Thanks for the review,
>
> M
>
> > >   };
> > >
> > >   /**
> > > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > > index 5750f0c005..a2467ba509 100644
> > > --- a/lib/vhost/vhost.h
> > > +++ b/lib/vhost/vhost.h
> > > @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
> > >       return true;
> > >   }
> > >
> > > -static __rte_always_inline void
> > > -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size,
> __rte_unused bool enable)
> > > -{
> > > -#ifdef MADV_DONTDUMP
> > > -     if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) ==
> -1) {
> > > -             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > > -                     "VHOST_CONFIG: could not set coredump preference
> (%s).\n", strerror(errno));
> > > -     }
> > > -#endif
> > > -}
> > > +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> > > +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t
> alignment);
> > >   #endif /* _VHOST_NET_CDEV_H_ */
> > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > > index d702d082dd..6d09597fbe 100644
> > > --- a/lib/vhost/vhost_user.c
> > > +++ b/lib/vhost/vhost_user.c
> > > @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
> > >       return log_gpa;
> > >   }
> > >
> > > +uint64_t
> > > +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> > > +{
> > > +     struct rte_vhost_mem_region *r;
> > > +     uint32_t i;
> > > +     uintptr_t hua = (uintptr_t)ptr;
> > > +
> > > +     for (i = 0; i < mem->nregions; i++) {
> > > +             r = &mem->regions[i];
> > > +             if (hua >= r->host_user_addr &&
> > > +                     hua < r->host_user_addr + r->size) {
> > > +                     return r->alignment;
> > > +             }
> > > +     }
> > > +
> > > +     /* If region isn't found, don't align at all */
> > > +     return 1;
> > > +}
> > > +
> > > +void
> > > +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> > > +{
> > > +#ifdef MADV_DONTDUMP
> > > +     void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> > > +     uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> > > +     size_t len = end - (uintptr_t)start;
> > > +
> > > +     if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) ==
> -1) {
> > > +             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > > +                     "VHOST_CONFIG: could not set coredump preference
> (%s).\n", strerror(errno));
> > > +     }
> > > +#endif
> > > +}
> > > +
> > >   static void
> > >   translate_ring_addresses(struct virtio_net **pdev, struct
> vhost_virtqueue **pvq)
> > >   {
> > > @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > +             mem_set_dump(vq->desc_packed, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->desc_packed));
> > >               numa_realloc(&dev, &vq);
> > >               *pdev = dev;
> > >               *pvq = vq;
> > > @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > +             mem_set_dump(vq->driver_event, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->driver_event));
> > >               len = sizeof(struct vring_packed_desc_event);
> > >               vq->device_event = (struct vring_packed_desc_event *)
> > >                                       (uintptr_t)ring_addr_to_vva(dev,
> > > @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > -             mem_set_dump(vq->desc_packed, len, true);
> > > -             mem_set_dump(vq->driver_event, len, true);
> > > -             mem_set_dump(vq->device_event, len, true);
> > > +             mem_set_dump(vq->device_event, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->device_event));
> > >               vq->access_ok = true;
> > >               return;
> > >       }
> > > @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem,
> vq->desc));
> > >       numa_realloc(&dev, &vq);
> > >       *pdev = dev;
> > >       *pvq = vq;
> > > @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem,
> vq->avail));
> > >       len = sizeof(struct vring_used) +
> > >               sizeof(struct vring_used_elem) * vq->size;
> > >       if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> > > @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem,
> vq->used));
> > > +
> > >       if (vq->last_used_idx != vq->used->idx) {
> > >               VHOST_LOG_CONFIG(dev->ifname, WARNING,
> > >                       "last_used_idx (%u) and vq->used->idx (%u)
> mismatches;\n",
> > > @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       "some packets maybe resent for Tx and dropped
> for Rx\n");
> > >       }
> > >
> > > -     mem_set_dump(vq->desc, len, true);
> > > -     mem_set_dump(vq->avail, len, true);
> > > -     mem_set_dump(vq->used, len, true);
> > >       vq->access_ok = true;
> > >
> > >       VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc:
> %p\n", vq->desc);
> > > @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
> > >       region->mmap_addr = mmap_addr;
> > >       region->mmap_size = mmap_size;
> > >       region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
> mmap_offset;
> > > -     mem_set_dump(mmap_addr, mmap_size, false);
> > > +     region->alignment = alignment;
> > > +     mem_set_dump(mmap_addr, mmap_size, false, alignment);
> > >
> > >       if (dev->async_copy) {
> > >               if (add_guest_pages(dev, region, alignment) < 0) {
> > > @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const
> char *name, size_t size, int *f
> > >               return NULL;
> > >       }
> > >
> > > -     mem_set_dump(ptr, size, false);
> > >       *fd = mfd;
> > >       return ptr;
> > >   }
> > > @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net
> **pdev,
> > >       uint64_t pervq_inflight_size, mmap_size;
> > >       uint16_t num_queues, queue_size;
> > >       struct virtio_net *dev = *pdev;
> > > +     uint64_t alignment;
> > >       int fd, i, j;
> > >       int numa_node = SOCKET_ID_ANY;
> > >       void *addr;
> > > @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net
> **pdev,
> > >               dev->inflight_info->fd = -1;
> > >       }
> > >
> > > +     alignment = get_blk_size(fd);
> > > +     mem_set_dump(addr, mmap_size, false, alignment);
> > >       dev->inflight_info->addr = addr;
> > >       dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size =
> mmap_size;
> > >       dev->inflight_info->fd = ctx->fds[0] = fd;
> > > @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net
> **pdev,
> > >               dev->inflight_info->fd = -1;
> > >       }
> > >
> > > -     mem_set_dump(addr, mmap_size, false);
> > >       dev->inflight_info->fd = fd;
> > >       dev->inflight_info->addr = addr;
> > >       dev->inflight_info->size = mmap_size;
> > > +     mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
> > >
> > >       for (i = 0; i < num_queues; i++) {
> > >               vq = dev->virtqueue[i];
> > > @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >       struct virtio_net *dev = *pdev;
> > >       int fd = ctx->fds[0];
> > >       uint64_t size, off;
> > > +     uint64_t alignment;
> > >       void *addr;
> > >       uint32_t i;
> > >
> > > @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >        * fail when offset is not page size aligned.
> > >        */
> > >       addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED,
> fd, 0);
> > > +     alignment = get_blk_size(fd);
> > >       close(fd);
> > >       if (addr == MAP_FAILED) {
> > >               VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base
> failed!\n");
> > > @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >       dev->log_addr = (uint64_t)(uintptr_t)addr;
> > >       dev->log_base = dev->log_addr + off;
> > >       dev->log_size = size;
> > > -     mem_set_dump(addr, size, false);
> > > +     mem_set_dump(addr, size + off, false, alignment);
> > >
> > >       for (i = 0; i < dev->nr_vring; i++) {
> > >               struct vhost_virtqueue *vq = dev->virtqueue[i];
> >
>
>

-- 

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu

[-- Attachment #2: Type: text/html, Size: 24631 bytes --]

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-23  7:11  0%     ` Ruifeng Wang
@ 2023-02-24  9:45  0%     ` Ruifeng Wang
  1 sibling, 0 replies; 200+ results
From: Ruifeng Wang @ 2023-02-24  9:45 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, February 23, 2023 5:56 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang <yipeng1.wang@intel.com>;
> Sameh Gobriel <sameh.gobriel@intel.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> The code for setting algorithm for hash is not at all perf sensitive, and doing it inline
> has a couple of problems. First, it means that if multiple files include the header, then
> the initialization gets done multiple times. But also, it makes it harder to fix usage of
> RTE_LOG().
> 
> Despite what the checking script say. This is not an ABI change, the previous version
> inlined the same code; therefore both old and new code will work the same.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/hash/meson.build     |  1 +
>  lib/hash/rte_crc_arm64.h |  8 ++---
>  lib/hash/rte_crc_x86.h   | 10 +++---
>  lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
>  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
>  lib/hash/version.map     |  7 +++++
>  6 files changed, 88 insertions(+), 54 deletions(-)  create mode 100644
> lib/hash/rte_hash_crc.c
> 
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>


^ permalink raw reply	[relevance 0%]

* 回复: [PATCH v3 1/3] ethdev: enable direct rearm with separate API
  @ 2023-02-24  8:55  0%           ` Feifei Wang
  0 siblings, 0 replies; 200+ results
From: Feifei Wang @ 2023-02-24  8:55 UTC (permalink / raw)
  To: Morten Brørup, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, konstantin.v.ananyev, nd, Honnappa Nagarahalli,
	Ruifeng Wang, nd, nd

Sorry for my delayed reply.

> -----邮件原件-----
> 发件人: Morten Brørup <mb@smartsharesystems.com>
> 发送时间: Wednesday, January 4, 2023 6:11 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> 抄送: dev@dpdk.org; konstantin.v.ananyev@yandex.ru; nd <nd@arm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> 主题: RE: [PATCH v3 1/3] ethdev: enable direct rearm with separate API
> 
> > From: Feifei Wang [mailto:Feifei.Wang2@arm.com]
> > Sent: Wednesday, 4 January 2023 09.51
> >
> > Hi, Morten
> >
> > > 发件人: Morten Brørup <mb@smartsharesystems.com>
> > > 发送时间: Wednesday, January 4, 2023 4:22 PM
> > >
> > > > From: Feifei Wang [mailto:feifei.wang2@arm.com]
> > > > Sent: Wednesday, 4 January 2023 08.31
> > > >
> > > > Add 'tx_fill_sw_ring' and 'rx_flush_descriptor' API into direct
> > rearm
> > > > mode for separate Rx and Tx Operation. And this can support
> > different
> > > > multiple sources in direct rearm mode. For examples, Rx driver is
> > > > ixgbe, and Tx driver is i40e.
> > > >
> > > > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > >
> > > This feature looks very promising for performance. I am pleased to
> > see
> > > progress on it.
> > >
> > Thanks very much for your reviewing.
> >
> > > Please confirm that the fast path functions are still thread safe,
> > i.e. one EAL
> > > thread may be calling rte_eth_rx_burst() while another EAL thread is
> > calling
> > > rte_eth_tx_burst().
> > >
> > For the multiple threads safe, like we say in cover letter, current
> > direct-rearm support Rx and Tx in the same thread. If we consider
> > multiple threads like 'pipeline model', there need to add 'lock' in
> > the data path which can decrease the performance.
> > Thus, the first step we do is try to enable direct-rearm in the single
> > thread, and then we will consider to enable direct rearm in multiple
> > threads and improve the performance.
> 
> OK, doing it in steps is a good idea for a feature like this - makes it easier to
> understand and review.
> 
> When proceeding to add support for the "pipeline model", perhaps the
> lockless principles from the rte_ring can be used in this feature too.
> 
> From a high level perspective, I'm somewhat worried that releasing a "work-
> in-progress" version of this feature in some DPDK version will cause API/ABI
> breakage discussions when progressing to the next steps of the
> implementation to make the feature more complete. Not only support for
> thread safety across simultaneous RX and TX, but also support for multiple
> mbuf pools per RX queue [1]. Marking the functions experimental should
> alleviate such discussions, but there is a risk of pushback to not break the
> API/ABI anyway.
> 
> [1]:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/ethdev/rte_ethdev.h#L1
> 105
> 

[Feifei] I think the subsequent upgrade does not significantly damage the stability
of the API we currently define.

For thread safety across simultaneous RX and TX, in the future, the lockless operation
change will happen in the pmd layer, such as CAS load/store for rxq queue index of pmd.
Thus, this can not affect the stability of the upper API.

For multiple mbuf pools per RX queue, direct-rearm just put Tx buffers into Rx buffers, and
it do not care which mempool the buffer coming from. 
From different mempool buffers eventually freed into their respective sources in the
no FAST_FREE path.  
I think this is a mistake in cover letter. Previous direct-rearm can just support FAST_FREE
so it constraint that buffer should be from the same mempool. Now, the latest version can
support no_FAST_FREE path, but we forget to make change in cover letter.
> [...]
> 
> > > > --- a/lib/ethdev/ethdev_driver.h
> > > > +++ b/lib/ethdev/ethdev_driver.h
> > > > @@ -59,6 +59,10 @@ struct rte_eth_dev {
> > > >  	eth_rx_descriptor_status_t rx_descriptor_status;
> > > >  	/** Check the status of a Tx descriptor */
> > > >  	eth_tx_descriptor_status_t tx_descriptor_status;
> > > > +	/** Fill Rx sw-ring with Tx buffers in direct rearm mode */
> > > > +	eth_tx_fill_sw_ring_t tx_fill_sw_ring;
> > >
> > > What is "Rx sw-ring"? Please confirm that this is not an Intel PMD
> > specific
> > > term and/or implementation detail, e.g. by providing a conceptual
> > > implementation for a non-Intel PMD, e.g. mlx5.
> > Rx sw_ring is used  to store mbufs in intel PMD. This is the same as
> > 'rxq->elts'
> > in mlx5.
> 
> Sounds good.
> 
> Then all we need is consensus on a generic name for this, unless "Rx sw-ring"
> already is the generic name. (I'm not a PMD developer, so I might be
> completely off track here.) Naming is often debatable, so I'll stop talking
> about it now - I only wanted to highlight that we should avoid vendor-
> specific terms in public APIs intended to be implemented by multiple vendors.
> On the other hand... if no other vendors raise their voices before merging
> into the DPDK main repository, they forfeit their right to complain about it. ;-)
> 
> > Agree with that we need to providing a conceptual implementation for
> > all PMDs.
> 
> My main point is that we should ensure that the feature is not too tightly
> coupled with the way Intel PMDs implement mbuf handling. Providing a
> conceptual implementation for a non-Intel PMD is one way of checking this.
> 
> The actual implementation in other PMDs could be left up to the various NIC
> vendors.

Yes. And we will rename our API to make it suitable for all vendors:
rte_eth_direct_rearm  ->  rte_eth_buf_cycle   (upper API for direct rearm)
rte_eth_tx_fill_sw_ring  -> rte_eth_tx_buf_stash   (Tx queue fill Rx ring buffer )
rte_eth_rx_flush_descriptor -> rte_eth_rx_descriptors_refill (Rx queue flush its descriptors)

rte_eth_rxq_rearm_data {
	void *rx_sw_ring;
	uint16_t *rearm_start;
	uint16_t *rearm_nb;
}

->

struct *rxq_recycle_info {
	rte_mbuf **buf_ring;
	uint16_t *offset = (uint16 *)(&rq-<ci);
	uint16_t *end;
	uint16_t ring_size; 

}

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-20 13:50  3%   ` Jerin Jacob
@ 2023-02-24  6:31  0%     ` Yan, Zhirun
  2023-02-26 22:23  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Yan, Zhirun @ 2023-02-24  6:31 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:51 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> 
> This will break the multiprocess.

Thanks. I will use TLS for per-thread local storage.

> 
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> 
> Why need strings?
> Also, every symbol in a public header file should start with RTE_ to avoid
> namespace conflict.

It was used to config the model in app. I can put the string into example.

> 
> > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +       RTE_GRAPH_MODEL_RTC,
> 
> Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> itself.
Yes, will do in next version.

> 
> > +#define WORKER_MODEL_GENERIC "generic"
> 
> Generic is a very overloaded term. Use pipeline here i.e
> RTE_GRAPH_MODEL_PIPELINE

Actually, it's not a purely pipeline mode. I prefer to change to hybrid. 
> 
> 
> > +       RTE_GRAPH_MODEL_GENERIC,
> > +       RTE_GRAPH_MODEL_MAX,
> 
> No need for MAX, it will break the ABI for future. See other subsystem such as
> cryptodev.

Thanks, I will change it.
> 
> > +};
> 
> >

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  2023-02-23 16:12  3%   ` Maxime Coquelin
@ 2023-02-23 16:57  0%     ` Mike Pattrick
  2023-02-24 15:05  4%       ` Patrick Robb
  0 siblings, 1 reply; 200+ results
From: Mike Pattrick @ 2023-02-23 16:57 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, david.marchand, chenbo.xia

On Thu, Feb 23, 2023 at 11:12 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi Mike,
>
> Thanks for  looking into this issue.
>
> On 2/23/23 05:35, Mike Pattrick wrote:
> > The arguments passed to madvise should be aligned to the alignment of
> > the backing memory. Now we keep track of each regions alignment and use
> > then when setting coredump preferences. To facilitate this, a new member
> > was added to rte_vhost_mem_region. A new function was added to easily
> > translate memory address back to region alignment. Unneeded calls to
> > madvise were reduced, as the cache removal case should already be
> > covered by the cache insertion case. The previously inline function
> > mem_set_dump was removed from a header file and made not inline.
> >
> > Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> >
> > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > ---
> > Since v1:
> >   - Corrected a cast for 32bit compiles
> > ---
> >   lib/vhost/iotlb.c      |  9 +++---
> >   lib/vhost/rte_vhost.h  |  1 +
> >   lib/vhost/vhost.h      | 12 ++------
> >   lib/vhost/vhost_user.c | 63 +++++++++++++++++++++++++++++++++++-------
> >   4 files changed, 60 insertions(+), 25 deletions(-)
> >
> > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> > index a0b8fd7302..5293507b63 100644
> > --- a/lib/vhost/iotlb.c
> > +++ b/lib/vhost/iotlb.c
> > @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
> >       rte_rwlock_write_lock(&vq->iotlb_lock);
> >
> >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > -             mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>
> Hmm, it should have been called with enable=false here since we are
> removing the entry from the IOTLB cache. It should be kept in order to
> "DONTDUMP" pages evicted from the cache.

Here I was thinking that if we add an entry and then remove a
different entry, they could be in the same page. But on I should have
kept an enable=false in remove_all().

And now that I think about it again, I could just check if there are
any active cache entries in the page on every evict/remove, they're
sorted so that should be an easy check. Unless there are any
objections I'll go forward with that.

>
> >               TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >               vhost_user_iotlb_pool_put(vq, node);
> >       }
> > @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
> >
> >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> >               if (!entry_idx) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>
> Same here.
>
> >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >                       vhost_user_iotlb_pool_put(vq, node);
> >                       vq->iotlb_cache_nr--;
> > @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
> >                       vhost_user_iotlb_pool_put(vq, new_node);
> >                       goto unlock;
> >               } else if (node->iova > new_node->iova) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > +                     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> > +                             hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
> >                       TAILQ_INSERT_BEFORE(node, new_node, next);
> >                       vq->iotlb_cache_nr++;
> >                       goto unlock;
> >               }
> >       }
> >
> > -     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > +     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> > +             hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
> >       TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
> >       vq->iotlb_cache_nr++;
> >
> > @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
> >                       break;
> >
> >               if (iova < node->iova + node->size) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >                       vhost_user_iotlb_pool_put(vq, node);
> >                       vq->iotlb_cache_nr--;
> > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > index a395843fe9..c5c97ea67e 100644
> > --- a/lib/vhost/rte_vhost.h
> > +++ b/lib/vhost/rte_vhost.h
> > @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
> >       void     *mmap_addr;
> >       uint64_t mmap_size;
> >       int fd;
> > +     uint64_t alignment;
>
> This is not possible to do this as it breaks the ABI.
> You have to store the information somewhere else, or simply call
> get_blk_size() in hua_to_alignment() since the fd is not closed.
>

Sorry about that! You're right, checking the fd per operation should
be easy enough.

Thanks for the review,

M

> >   };
> >
> >   /**
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index 5750f0c005..a2467ba509 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
> >       return true;
> >   }
> >
> > -static __rte_always_inline void
> > -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size, __rte_unused bool enable)
> > -{
> > -#ifdef MADV_DONTDUMP
> > -     if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> > -             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > -                     "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> > -     }
> > -#endif
> > -}
> > +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> > +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
> >   #endif /* _VHOST_NET_CDEV_H_ */
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > index d702d082dd..6d09597fbe 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
> >       return log_gpa;
> >   }
> >
> > +uint64_t
> > +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> > +{
> > +     struct rte_vhost_mem_region *r;
> > +     uint32_t i;
> > +     uintptr_t hua = (uintptr_t)ptr;
> > +
> > +     for (i = 0; i < mem->nregions; i++) {
> > +             r = &mem->regions[i];
> > +             if (hua >= r->host_user_addr &&
> > +                     hua < r->host_user_addr + r->size) {
> > +                     return r->alignment;
> > +             }
> > +     }
> > +
> > +     /* If region isn't found, don't align at all */
> > +     return 1;
> > +}
> > +
> > +void
> > +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> > +{
> > +#ifdef MADV_DONTDUMP
> > +     void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> > +     uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> > +     size_t len = end - (uintptr_t)start;
> > +
> > +     if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> > +             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > +                     "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> > +     }
> > +#endif
> > +}
> > +
> >   static void
> >   translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >   {
> > @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > +             mem_set_dump(vq->desc_packed, len, true,
> > +                     hua_to_alignment(dev->mem, vq->desc_packed));
> >               numa_realloc(&dev, &vq);
> >               *pdev = dev;
> >               *pvq = vq;
> > @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > +             mem_set_dump(vq->driver_event, len, true,
> > +                     hua_to_alignment(dev->mem, vq->driver_event));
> >               len = sizeof(struct vring_packed_desc_event);
> >               vq->device_event = (struct vring_packed_desc_event *)
> >                                       (uintptr_t)ring_addr_to_vva(dev,
> > @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > -             mem_set_dump(vq->desc_packed, len, true);
> > -             mem_set_dump(vq->driver_event, len, true);
> > -             mem_set_dump(vq->device_event, len, true);
> > +             mem_set_dump(vq->device_event, len, true,
> > +                     hua_to_alignment(dev->mem, vq->device_event));
> >               vq->access_ok = true;
> >               return;
> >       }
> > @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem, vq->desc));
> >       numa_realloc(&dev, &vq);
> >       *pdev = dev;
> >       *pvq = vq;
> > @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem, vq->avail));
> >       len = sizeof(struct vring_used) +
> >               sizeof(struct vring_used_elem) * vq->size;
> >       if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> > @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem, vq->used));
> > +
> >       if (vq->last_used_idx != vq->used->idx) {
> >               VHOST_LOG_CONFIG(dev->ifname, WARNING,
> >                       "last_used_idx (%u) and vq->used->idx (%u) mismatches;\n",
> > @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       "some packets maybe resent for Tx and dropped for Rx\n");
> >       }
> >
> > -     mem_set_dump(vq->desc, len, true);
> > -     mem_set_dump(vq->avail, len, true);
> > -     mem_set_dump(vq->used, len, true);
> >       vq->access_ok = true;
> >
> >       VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc: %p\n", vq->desc);
> > @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
> >       region->mmap_addr = mmap_addr;
> >       region->mmap_size = mmap_size;
> >       region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
> > -     mem_set_dump(mmap_addr, mmap_size, false);
> > +     region->alignment = alignment;
> > +     mem_set_dump(mmap_addr, mmap_size, false, alignment);
> >
> >       if (dev->async_copy) {
> >               if (add_guest_pages(dev, region, alignment) < 0) {
> > @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const char *name, size_t size, int *f
> >               return NULL;
> >       }
> >
> > -     mem_set_dump(ptr, size, false);
> >       *fd = mfd;
> >       return ptr;
> >   }
> > @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
> >       uint64_t pervq_inflight_size, mmap_size;
> >       uint16_t num_queues, queue_size;
> >       struct virtio_net *dev = *pdev;
> > +     uint64_t alignment;
> >       int fd, i, j;
> >       int numa_node = SOCKET_ID_ANY;
> >       void *addr;
> > @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
> >               dev->inflight_info->fd = -1;
> >       }
> >
> > +     alignment = get_blk_size(fd);
> > +     mem_set_dump(addr, mmap_size, false, alignment);
> >       dev->inflight_info->addr = addr;
> >       dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size = mmap_size;
> >       dev->inflight_info->fd = ctx->fds[0] = fd;
> > @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev,
> >               dev->inflight_info->fd = -1;
> >       }
> >
> > -     mem_set_dump(addr, mmap_size, false);
> >       dev->inflight_info->fd = fd;
> >       dev->inflight_info->addr = addr;
> >       dev->inflight_info->size = mmap_size;
> > +     mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
> >
> >       for (i = 0; i < num_queues; i++) {
> >               vq = dev->virtqueue[i];
> > @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >       struct virtio_net *dev = *pdev;
> >       int fd = ctx->fds[0];
> >       uint64_t size, off;
> > +     uint64_t alignment;
> >       void *addr;
> >       uint32_t i;
> >
> > @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >        * fail when offset is not page size aligned.
> >        */
> >       addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +     alignment = get_blk_size(fd);
> >       close(fd);
> >       if (addr == MAP_FAILED) {
> >               VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base failed!\n");
> > @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >       dev->log_addr = (uint64_t)(uintptr_t)addr;
> >       dev->log_base = dev->log_addr + off;
> >       dev->log_size = size;
> > -     mem_set_dump(addr, size, false);
> > +     mem_set_dump(addr, size + off, false, alignment);
> >
> >       for (i = 0; i < dev->nr_vring; i++) {
> >               struct vhost_virtqueue *vq = dev->virtqueue[i];
>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  @ 2023-02-23 16:12  3%   ` Maxime Coquelin
  2023-02-23 16:57  0%     ` Mike Pattrick
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2023-02-23 16:12 UTC (permalink / raw)
  To: Mike Pattrick, dev; +Cc: david.marchand, chenbo.xia

Hi Mike,

Thanks for  looking into this issue.

On 2/23/23 05:35, Mike Pattrick wrote:
> The arguments passed to madvise should be aligned to the alignment of
> the backing memory. Now we keep track of each regions alignment and use
> then when setting coredump preferences. To facilitate this, a new member
> was added to rte_vhost_mem_region. A new function was added to easily
> translate memory address back to region alignment. Unneeded calls to
> madvise were reduced, as the cache removal case should already be
> covered by the cache insertion case. The previously inline function
> mem_set_dump was removed from a header file and made not inline.
> 
> Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> 
> Signed-off-by: Mike Pattrick <mkp@redhat.com>
> ---
> Since v1:
>   - Corrected a cast for 32bit compiles
> ---
>   lib/vhost/iotlb.c      |  9 +++---
>   lib/vhost/rte_vhost.h  |  1 +
>   lib/vhost/vhost.h      | 12 ++------
>   lib/vhost/vhost_user.c | 63 +++++++++++++++++++++++++++++++++++-------
>   4 files changed, 60 insertions(+), 25 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index a0b8fd7302..5293507b63 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
>   	rte_rwlock_write_lock(&vq->iotlb_lock);
>   
>   	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> -		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);

Hmm, it should have been called with enable=false here since we are
removing the entry from the IOTLB cache. It should be kept in order to
"DONTDUMP" pages evicted from the cache.

>   		TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   		vhost_user_iotlb_pool_put(vq, node);
>   	}
> @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
>   
>   	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
>   		if (!entry_idx) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);

Same here.

>   			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   			vhost_user_iotlb_pool_put(vq, node);
>   			vq->iotlb_cache_nr--;
> @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
>   			vhost_user_iotlb_pool_put(vq, new_node);
>   			goto unlock;
>   		} else if (node->iova > new_node->iova) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> +			mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> +				hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
>   			TAILQ_INSERT_BEFORE(node, new_node, next);
>   			vq->iotlb_cache_nr++;
>   			goto unlock;
>   		}
>   	}
>   
> -	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> +	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> +		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
>   	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
>   	vq->iotlb_cache_nr++;
>   
> @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
>   			break;
>   
>   		if (iova < node->iova + node->size) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>   			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   			vhost_user_iotlb_pool_put(vq, node);
>   			vq->iotlb_cache_nr--;
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index a395843fe9..c5c97ea67e 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
>   	void	 *mmap_addr;
>   	uint64_t mmap_size;
>   	int fd;
> +	uint64_t alignment;

This is not possible to do this as it breaks the ABI.
You have to store the information somewhere else, or simply call
get_blk_size() in hua_to_alignment() since the fd is not closed.

>   };
>   
>   /**
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 5750f0c005..a2467ba509 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
>   	return true;
>   }
>   
> -static __rte_always_inline void
> -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size, __rte_unused bool enable)
> -{
> -#ifdef MADV_DONTDUMP
> -	if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> -		rte_log(RTE_LOG_INFO, vhost_config_log_level,
> -			"VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> -	}
> -#endif
> -}
> +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
>   #endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index d702d082dd..6d09597fbe 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
>   	return log_gpa;
>   }
>   
> +uint64_t
> +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> +{
> +	struct rte_vhost_mem_region *r;
> +	uint32_t i;
> +	uintptr_t hua = (uintptr_t)ptr;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		r = &mem->regions[i];
> +		if (hua >= r->host_user_addr &&
> +			hua < r->host_user_addr + r->size) {
> +			return r->alignment;
> +		}
> +	}
> +
> +	/* If region isn't found, don't align at all */
> +	return 1;
> +}
> +
> +void
> +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> +{
> +#ifdef MADV_DONTDUMP
> +	void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> +	uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> +	size_t len = end - (uintptr_t)start;
> +
> +	if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> +		rte_log(RTE_LOG_INFO, vhost_config_log_level,
> +			"VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> +	}
> +#endif
> +}
> +
>   static void
>   translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   {
> @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> +		mem_set_dump(vq->desc_packed, len, true,
> +			hua_to_alignment(dev->mem, vq->desc_packed));
>   		numa_realloc(&dev, &vq);
>   		*pdev = dev;
>   		*pvq = vq;
> @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> +		mem_set_dump(vq->driver_event, len, true,
> +			hua_to_alignment(dev->mem, vq->driver_event));
>   		len = sizeof(struct vring_packed_desc_event);
>   		vq->device_event = (struct vring_packed_desc_event *)
>   					(uintptr_t)ring_addr_to_vva(dev,
> @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> -		mem_set_dump(vq->desc_packed, len, true);
> -		mem_set_dump(vq->driver_event, len, true);
> -		mem_set_dump(vq->device_event, len, true);
> +		mem_set_dump(vq->device_event, len, true,
> +			hua_to_alignment(dev->mem, vq->device_event));
>   		vq->access_ok = true;
>   		return;
>   	}
> @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem, vq->desc));
>   	numa_realloc(&dev, &vq);
>   	*pdev = dev;
>   	*pvq = vq;
> @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem, vq->avail));
>   	len = sizeof(struct vring_used) +
>   		sizeof(struct vring_used_elem) * vq->size;
>   	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem, vq->used));
> +
>   	if (vq->last_used_idx != vq->used->idx) {
>   		VHOST_LOG_CONFIG(dev->ifname, WARNING,
>   			"last_used_idx (%u) and vq->used->idx (%u) mismatches;\n",
> @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			"some packets maybe resent for Tx and dropped for Rx\n");
>   	}
>   
> -	mem_set_dump(vq->desc, len, true);
> -	mem_set_dump(vq->avail, len, true);
> -	mem_set_dump(vq->used, len, true);
>   	vq->access_ok = true;
>   
>   	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc: %p\n", vq->desc);
> @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
>   	region->mmap_addr = mmap_addr;
>   	region->mmap_size = mmap_size;
>   	region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
> -	mem_set_dump(mmap_addr, mmap_size, false);
> +	region->alignment = alignment;
> +	mem_set_dump(mmap_addr, mmap_size, false, alignment);
>   
>   	if (dev->async_copy) {
>   		if (add_guest_pages(dev, region, alignment) < 0) {
> @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const char *name, size_t size, int *f
>   		return NULL;
>   	}
>   
> -	mem_set_dump(ptr, size, false);
>   	*fd = mfd;
>   	return ptr;
>   }
> @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
>   	uint64_t pervq_inflight_size, mmap_size;
>   	uint16_t num_queues, queue_size;
>   	struct virtio_net *dev = *pdev;
> +	uint64_t alignment;
>   	int fd, i, j;
>   	int numa_node = SOCKET_ID_ANY;
>   	void *addr;
> @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
>   		dev->inflight_info->fd = -1;
>   	}
>   
> +	alignment = get_blk_size(fd);
> +	mem_set_dump(addr, mmap_size, false, alignment);
>   	dev->inflight_info->addr = addr;
>   	dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size = mmap_size;
>   	dev->inflight_info->fd = ctx->fds[0] = fd;
> @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev,
>   		dev->inflight_info->fd = -1;
>   	}
>   
> -	mem_set_dump(addr, mmap_size, false);
>   	dev->inflight_info->fd = fd;
>   	dev->inflight_info->addr = addr;
>   	dev->inflight_info->size = mmap_size;
> +	mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
>   
>   	for (i = 0; i < num_queues; i++) {
>   		vq = dev->virtqueue[i];
> @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	struct virtio_net *dev = *pdev;
>   	int fd = ctx->fds[0];
>   	uint64_t size, off;
> +	uint64_t alignment;
>   	void *addr;
>   	uint32_t i;
>   
> @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	 * fail when offset is not page size aligned.
>   	 */
>   	addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	alignment = get_blk_size(fd);
>   	close(fd);
>   	if (addr == MAP_FAILED) {
>   		VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base failed!\n");
> @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	dev->log_addr = (uint64_t)(uintptr_t)addr;
>   	dev->log_base = dev->log_addr + off;
>   	dev->log_size = size;
> -	mem_set_dump(addr, size, false);
> +	mem_set_dump(addr, size + off, false, alignment);
>   
>   	for (i = 0; i < dev->nr_vring; i++) {
>   		struct vhost_virtqueue *vq = dev->virtqueue[i];


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-23  7:11  0%     ` Ruifeng Wang
@ 2023-02-23  7:27  0%       ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2023-02-23  7:27 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd, nd

> -----Original Message-----
> From: Ruifeng Wang
> Sent: Thursday, February 23, 2023 3:11 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org
> Cc: Yipeng Wang <yipeng1.wang@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>; Bruce
> Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin <vladimir.medvedkin@intel.com>;
> nd <nd@arm.com>
> Subject: RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, February 23, 2023 5:56 AM
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang
> > <yipeng1.wang@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>;
> > Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> > <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> >
> > The code for setting algorithm for hash is not at all perf sensitive,
> > and doing it inline has a couple of problems. First, it means that if
> > multiple files include the header, then the initialization gets done
> > multiple times. But also, it makes it harder to fix usage of RTE_LOG().
> >
> > Despite what the checking script say. This is not an ABI change, the
> > previous version inlined the same code; therefore both old and new code will work the
> same.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/hash/meson.build     |  1 +
> >  lib/hash/rte_crc_arm64.h |  8 ++---
> >  lib/hash/rte_crc_x86.h   | 10 +++---
> >  lib/hash/rte_hash_crc.c  | 68
> > ++++++++++++++++++++++++++++++++++++++++
> >  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
> >  lib/hash/version.map     |  7 +++++
> >  6 files changed, 88 insertions(+), 54 deletions(-)  create mode
> > 100644 lib/hash/rte_hash_crc.c
> >
> > diff --git a/lib/hash/meson.build b/lib/hash/meson.build index
> > e56ee8572564..c345c6f561fc
> > 100644
> > --- a/lib/hash/meson.build
> > +++ b/lib/hash/meson.build
> > @@ -19,6 +19,7 @@ indirect_headers += files(
> >
> >  sources = files(
> >      'rte_cuckoo_hash.c',
> > +    'rte_hash_crc.c',
> 
> I suppose this list is alphabetically ordered.
> 
> >      'rte_fbk_hash.c',
> >      'rte_thash.c',
> >      'rte_thash_gfni.c'
> <snip>
> > diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h index
> > 0249ad16c5b6..e8145ee44204 100644
> > --- a/lib/hash/rte_hash_crc.h
> > +++ b/lib/hash/rte_hash_crc.h
> > @@ -20,8 +20,6 @@ extern "C" {
> >  #include <rte_branch_prediction.h>
> >  #include <rte_common.h>
> >  #include <rte_config.h>
> > -#include <rte_cpuflags.h>
> 
> A couple of files need update with this change.
> rte_cpuflags.h should be included in rte_fbk_hash.c (for ARM) and rte_efd.c.

OK, I see the changes already there in other patches in the same series.
Please ignore this comment.
Thanks.

> 
> > -#include <rte_log.h>
> >
> >  #include "rte_crc_sw.h"
> >
> <snip>

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
@ 2023-02-23  7:11  0%     ` Ruifeng Wang
  2023-02-23  7:27  0%       ` Ruifeng Wang
  2023-02-24  9:45  0%     ` Ruifeng Wang
  1 sibling, 1 reply; 200+ results
From: Ruifeng Wang @ 2023-02-23  7:11 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, February 23, 2023 5:56 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang <yipeng1.wang@intel.com>;
> Sameh Gobriel <sameh.gobriel@intel.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> The code for setting algorithm for hash is not at all perf sensitive, and doing it inline
> has a couple of problems. First, it means that if multiple files include the header, then
> the initialization gets done multiple times. But also, it makes it harder to fix usage of
> RTE_LOG().
> 
> Despite what the checking script say. This is not an ABI change, the previous version
> inlined the same code; therefore both old and new code will work the same.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/hash/meson.build     |  1 +
>  lib/hash/rte_crc_arm64.h |  8 ++---
>  lib/hash/rte_crc_x86.h   | 10 +++---
>  lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
>  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
>  lib/hash/version.map     |  7 +++++
>  6 files changed, 88 insertions(+), 54 deletions(-)  create mode 100644
> lib/hash/rte_hash_crc.c
> 
> diff --git a/lib/hash/meson.build b/lib/hash/meson.build index e56ee8572564..c345c6f561fc
> 100644
> --- a/lib/hash/meson.build
> +++ b/lib/hash/meson.build
> @@ -19,6 +19,7 @@ indirect_headers += files(
> 
>  sources = files(
>      'rte_cuckoo_hash.c',
> +    'rte_hash_crc.c',

I suppose this list is alphabetically ordered.

>      'rte_fbk_hash.c',
>      'rte_thash.c',
>      'rte_thash_gfni.c'
<snip>
> diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h index
> 0249ad16c5b6..e8145ee44204 100644
> --- a/lib/hash/rte_hash_crc.h
> +++ b/lib/hash/rte_hash_crc.h
> @@ -20,8 +20,6 @@ extern "C" {
>  #include <rte_branch_prediction.h>
>  #include <rte_common.h>
>  #include <rte_config.h>
> -#include <rte_cpuflags.h>

A couple of files need update with this change.
rte_cpuflags.h should be included in rte_fbk_hash.c (for ARM) and rte_efd.c.

> -#include <rte_log.h>
> 
>  #include "rte_crc_sw.h"
> 
<snip>

^ permalink raw reply	[relevance 0%]

* [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 21:55  2%   ` Stephen Hemminger
  2023-02-23  7:11  0%     ` Ruifeng Wang
  2023-02-24  9:45  0%     ` Ruifeng Wang
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2023-02-22 21:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v11 00/22] Convert static log type values in libraries
                     ` (6 preceding siblings ...)
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 21:55  2% ` Stephen Hemminger
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-22 21:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v11 - fix include check on arm cross build

v10 - add necessary rte_compat.h in thash_gfni stub for arm

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 30 ++++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 409 insertions(+), 177 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v10 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 16:08  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-22 16:08 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v10 00/22] Convert static log type values in libraries
                     ` (5 preceding siblings ...)
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-22 16:07  2% ` Stephen Hemminger
  2023-02-22 16:08  2%   ` [PATCH v10 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-22 16:07 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v10 - add necessary rte_compat.h in thash_gfni stub for arm

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 29 +++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 406 insertions(+), 179 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v2] mem: fix displaying heap ID failed for heap info command
  @ 2023-02-22  7:49  4% ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2023-02-22  7:49 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, mb, hkalra, huangdaode, fengchengwen, lihuisong

The telemetry lib has added a allowed characters set for dictionary names.
Please see commit 2537fb0c5f34 ("telemetry: limit characters allowed in
dictionary names")

The space is not in this set, which cause the heap ID in /eal/heap_info
cannot be displayed. Additionally, 'heap' is also misspelling. So use
'Heap_id' to replace 'Head id'.

Fixes: e6732d0d6e26 ("mem: add telemetry infos")
Fixes: 2537fb0c5f34 ("telemetry: limit characters allowed in dictionary names")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
 -v2: add announcement in rel_notes.
---
 doc/guides/rel_notes/release_23_03.rst | 2 ++
 lib/eal/common/eal_common_memory.c     | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 49c18617a5..bdee535046 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -237,6 +237,8 @@ API Changes
 * The experimental structures ``struct rte_graph_param``, ``struct rte_graph``
   and ``struct graph`` were updated to support pcap trace in the graph library.
 
+* The ``Head ip`` in the displaying of ``/eal/heap_info`` telemetry command
+  is modified to ``Heap_id`` to ensure that it can be printed.
 
 ABI Changes
 -----------
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c917b981bc..c2a4c8f9e7 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -1139,7 +1139,7 @@ handle_eal_heap_info_request(const char *cmd __rte_unused, const char *params,
 	malloc_heap_get_stats(heap, &sock_stats);
 
 	rte_tel_data_start_dict(d);
-	rte_tel_data_add_dict_uint(d, "Head id", heap_id);
+	rte_tel_data_add_dict_uint(d, "Heap_id", heap_id);
 	rte_tel_data_add_dict_string(d, "Name", heap->name);
 	rte_tel_data_add_dict_uint(d, "Heap_size",
 				   sock_stats.heap_totalsz_bytes);
-- 
2.33.0


^ permalink raw reply	[relevance 4%]

* [PATCH v9 21/22] hash: move rte_hash_set_alg out header
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-21 19:02  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-21 19:02 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v9 00/22] Convert static logtypes in libraries
                     ` (4 preceding siblings ...)
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-21 19:01  2% ` Stephen Hemminger
  2023-02-21 19:02  2%   ` [PATCH v9 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-21 19:01 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 405 insertions(+), 179 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v8 21/22] hash: move rte_hash_set_alg out header
  2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
@ 2023-02-21 15:02  0%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-02-21 15:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Tue, Feb 21, 2023 at 12:38 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> The code for setting algorithm for hash is not at all perf sensitive,
> and doing it inline has a couple of problems. First, it means that if
> multiple files include the header, then the initialization gets done
> multiple times. But also, it makes it harder to fix usage of RTE_LOG().
>
> Despite what the checking script say. This is not an ABI change, the
> previous version inlined the same code; therefore both old and new code
> will work the same.

I suppose you are referring to:
http://mails.dpdk.org/archives/test-report/2023-February/356872.html
ERROR: symbol rte_hash_crc_set_alg is added in the DPDK_23 section,
but is expected to be added in the EXPERIMENTAL section of the version
map

I agree that this is irrelevant and can be ignored in this particular case.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [PATCH v2 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:55  3%     ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:55 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHAINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHAINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* [PATCH v2 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:29  3%   ` Chaoyong He
    1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:29 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHANINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHANINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* [PATCH 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:10  3% ` Chaoyong He
    1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:10 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHANINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHANINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* Re: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
  @ 2023-02-21  0:48  3%                     ` Konstantin Ananyev
  2023-02-27  8:12  0%                       ` Tomasz Duszynski
  0 siblings, 1 reply; 200+ results
From: Konstantin Ananyev @ 2023-02-21  0:48 UTC (permalink / raw)
  To: Tomasz Duszynski, Konstantin Ananyev, dev


>>>>>>>>> diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file
>>>>>>>>> mode
>>>>>>>>> 100644 index 0000000000..6b664c3336
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/lib/pmu/rte_pmu.h
>>>>>>>>> @@ -0,0 +1,212 @@
>>>>>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>>>>>> + * Copyright(c) 2023 Marvell  */
>>>>>>>>> +
>>>>>>>>> +#ifndef _RTE_PMU_H_
>>>>>>>>> +#define _RTE_PMU_H_
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @file
>>>>>>>>> + *
>>>>>>>>> + * PMU event tracing operations
>>>>>>>>> + *
>>>>>>>>> + * This file defines generic API and types necessary to
>>>>>>>>> +setup PMU and
>>>>>>>>> + * read selected counters in runtime.
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>> +extern "C" {
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>> +#include <linux/perf_event.h>
>>>>>>>>> +
>>>>>>>>> +#include <rte_atomic.h>
>>>>>>>>> +#include <rte_branch_prediction.h> #include <rte_common.h>
>>>>>>>>> +#include <rte_compat.h> #include <rte_spinlock.h>
>>>>>>>>> +
>>>>>>>>> +/** Maximum number of events in a group */ #define
>>>>>>>>> +MAX_NUM_GROUP_EVENTS 8
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A structure describing a group of events.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu_event_group {
>>>>>>>>> +	struct perf_event_mmap_page
>>>>>>>>> +*mmap_pages[MAX_NUM_GROUP_EVENTS];
>>>>>>>>> +/**< array of user pages
>>>>>> */
>>>>>>>>> +	int fds[MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */
>>>>>>>>> +	bool enabled; /**< true if group was enabled on particular lcore */
>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ }
>>>>>>>>> +__rte_cache_aligned;
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A structure describing an event.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu_event {
>>>>>>>>> +	char *name; /**< name of an event */
>>>>>>>>> +	unsigned int index; /**< event index into fds/mmap_pages */
>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event) next; /**< list entry */ };
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A PMU state container.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu {
>>>>>>>>> +	char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */
>>>>>>>>> +	rte_spinlock_t lock; /**< serialize access to event group list */
>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event_group) event_group_list; /**< list of event groups */
>>>>>>>>> +	unsigned int num_group_events; /**< number of events in a group */
>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */
>>>>>>>>> +	unsigned int initialized; /**< initialization counter */ };
>>>>>>>>> +
>>>>>>>>> +/** lcore event group */
>>>>>>>>> +RTE_DECLARE_PER_LCORE(struct rte_pmu_event_group,
>>>>>>>>> +_event_group);
>>>>>>>>> +
>>>>>>>>> +/** PMU state container */
>>>>>>>>> +extern struct rte_pmu rte_pmu;
>>>>>>>>> +
>>>>>>>>> +/** Each architecture supporting PMU needs to provide its
>>>>>>>>> +own version */ #ifndef rte_pmu_pmc_read #define
>>>>>>>>> +rte_pmu_pmc_read(index) ({ 0; }) #endif
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Read PMU counter.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @param pc
>>>>>>>>> + *   Pointer to the mmapped user page.
>>>>>>>>> + * @return
>>>>>>>>> + *   Counter value read from hardware.
>>>>>>>>> + */
>>>>>>>>> +static __rte_always_inline uint64_t
>>>>>>>>> +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) {
>>>>>>>>> +	uint64_t width, offset;
>>>>>>>>> +	uint32_t seq, index;
>>>>>>>>> +	int64_t pmc;
>>>>>>>>> +
>>>>>>>>> +	for (;;) {
>>>>>>>>> +		seq = pc->lock;
>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>
>>>>>>>> Are you sure that compiler_barrier() is enough here?
>>>>>>>> On some archs CPU itself has freedom to re-order reads.
>>>>>>>> Or I am missing something obvious here?
>>>>>>>>
>>>>>>>
>>>>>>> It's a matter of not keeping old stuff cached in registers and
>>>>>>> making sure that we have two reads of lock. CPU reordering won't
>>>>>>> do any harm here.
>>>>>>
>>>>>> Sorry, I didn't get you here:
>>>>>> Suppose CPU will re-order reads and will read lock *after* index or offset value.
>>>>>> Wouldn't it mean that in that case index and/or offset can contain old/invalid values?
>>>>>>
>>>>>
>>>>> This number is just an indicator whether kernel did change something or not.
>>>>
>>>> You are talking about pc->lock, right?
>>>> Yes, I do understand that it is sort of seqlock.
>>>> That's why I am puzzled why we do not care about possible cpu read-reordering.
>>>> Manual for perf_event_open() also has a code snippet with compiler barrier only...
>>>>
>>>>> If cpu reordering will come into play then this will not change anything from pov of this
>> loop.
>>>>> All we want is fresh data when needed and no involvement of
>>>>> compiler when it comes to reordering code.
>>>>
>>>> Ok, can you probably explain to me why the following could not happen:
>>>> T0:
>>>> pc->seqlock==0; pc->index==I1; pc->offset==O1;
>>>> T1:
>>>>       cpu #0 read pmu (due to cpu read reorder, we get index value before seqlock):
>>>>        index=pc->index;  //index==I1;
>>>> T2:
>>>>       cpu #1 kernel vent_update_userpage:
>>>>       pc->lock++; // pc->lock==1
>>>>       pc->index=I2;
>>>>       pc->offset=O2;
>>>>       ...
>>>>       pc->lock++; //pc->lock==2
>>>> T3:
>>>>       cpu #0 continue with read pmu:
>>>>       seq=pc->lock; //seq == 2
>>>>        offset=pc->offset; // offset == O2
>>>>        ....
>>>>        pmc = rte_pmu_pmc_read(index - 1);  // Note that we read at I1, not I2
>>>>        offset += pmc; //offset == O2 + pmcread(I1-1);
>>>>        if (pc->lock == seq) // they are equal, return
>>>>              return offset;
>>>>
>>>> Or, it can happen, but by some reason we don't care much?
>>>>
>>>
>>> This code does self-monitoring and user page (whole group actually) is
>>> per thread running on current cpu. Hence I am not sure what are you trying to prove with that
>> example.
>>
>> I am not trying to prove anything so far.
>> I am asking is such situation possible or not, and if not, why?
>> My current understanding (possibly wrong) is that after you mmaped these pages, kernel still can
>> asynchronously update them.
>> So, when reading the data from these pages you have to check 'lock' value before and after
>> accessing other data.
>> If so, why possible cpu read-reordering doesn't matter?
>>
> 
> Look. I'll reiterate that.
> 
> 1. That user page/group/PMU config is per process. Other processes do not access that.

Ok, that's clear.


>     All this happens on the very same CPU where current thread is running.

Ok... but can't this page be updated by kernel thread running 
simultaneously on different CPU?


> 2. Suppose you've already read seq. Now for some reason kernel updates data in page seq was read from.
> 3. Kernel will enter critical section during update. seq changes along with other data without app knowing about it.
>     If you want nitty gritty details consult kernel sources.

Look, I don't have to beg you to answer these questions.
In fact, I expect library author to document all such narrow things 
clearly either in in PG, or in source code comments (ideally in both).
If not, then from my perspective the patch is not ready stage and 
shouldn't be accepted.
I don't know is compiler-barrier is enough here or not, but I think it 
is definitely worth a clear explanation in the docs.
I suppose it wouldn't be only me who will get confused here.
So please take an effort and document it clearly why you believe there 
is no race-condition.

> 4. app resumes and has some stale data but *WILL* read new seq. Code loops again because values do not match.

If the kernel will always execute update for this page in the same 
thread context, then yes, - user code will always note the difference
after resume.
But why it can't happen that your user-thread reads this page on one 
CPU, while some kernel code on other CPU updates it simultaneously?


> 5. Otherwise seq values match and data is valid.
> 
>> Also there was another question below, which you probably  missed, so I copied it here:
>> Another question - do we really need  to have __rte_pmu_read_userpage() and rte_pmu_read() as
>> static inline functions in public header?
>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>> definitions also public.
>>
> 
> These functions need to be inlined otherwise performance takes a hit.

I understand that perfomance might be affected, but how big is hit?
I expect actual PMU read will not be free anyway, right?
If the diff is small, might be it is worth to go for such change,
removing unneeded structures from public headers would help a lot in 
future in terms of ABI/API stability.



>>>
>>>>>>>
>>>>>>>>> +		index = pc->index;
>>>>>>>>> +		offset = pc->offset;
>>>>>>>>> +		width = pc->pmc_width;
>>>>>>>>> +
>>>>>>>>> +		/* index set to 0 means that particular counter cannot be used */
>>>>>>>>> +		if (likely(pc->cap_user_rdpmc && index)) {
>>>>>>>>> +			pmc = rte_pmu_pmc_read(index - 1);
>>>>>>>>> +			pmc <<= 64 - width;
>>>>>>>>> +			pmc >>= 64 - width;
>>>>>>>>> +			offset += pmc;
>>>>>>>>> +		}
>>>>>>>>> +
>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>> +
>>>>>>>>> +		if (likely(pc->lock == seq))
>>>>>>>>> +			return offset;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Enable group of events on the calling lcore.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +__rte_pmu_enable_group(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Initialize PMU library.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +rte_pmu_init(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Finalize PMU library. This should be called after PMU
>>>>>>>>> +counters are no longer being
>>>> read.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +void
>>>>>>>>> +rte_pmu_fini(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Add event to the group of enabled events.
>>>>>>>>> + *
>>>>>>>>> + * @param name
>>>>>>>>> + *   Name of an event listed under /sys/bus/event_source/devices/pmu/events.
>>>>>>>>> + * @return
>>>>>>>>> + *   Event index in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +rte_pmu_add_event(const char *name);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Read hardware counter configured to count occurrences of an event.
>>>>>>>>> + *
>>>>>>>>> + * @param index
>>>>>>>>> + *   Index of an event to be read.
>>>>>>>>> + * @return
>>>>>>>>> + *   Event value read from register. In case of errors or lack of support
>>>>>>>>> + *   0 is returned. In other words, stream of zeros in a trace file
>>>>>>>>> + *   indicates problem with reading particular PMU event register.
>>>>>>>>> + */
>>>>
>>>> Another question - do we really need  to have
>>>> __rte_pmu_read_userpage() and rte_pmu_read() as static inline functions in public header?
>>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>>> definitions also public.
>>>>
>>>>>>>>> +__rte_experimental
>>>>>>>>> +static __rte_always_inline uint64_t rte_pmu_read(unsigned
>>>>>>>>> +int
>>>>>>>>> +index) {
>>>>>>>>> +	struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(!rte_pmu.initialized))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(!group->enabled)) {
>>>>>>>>> +		ret = __rte_pmu_enable_group();
>>>>>>>>> +		if (ret)
>>>>>>>>> +			return 0;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(index >= rte_pmu.num_group_events))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	return __rte_pmu_read_userpage(group->mmap_pages[index]);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>> +}
>>>>>>>>> +#endif
>>>>>>>>> +
> 


^ permalink raw reply	[relevance 3%]

* [PATCH v8 21/22] hash: move rte_hash_set_alg out header
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-20 23:35  3%   ` Stephen Hemminger
  2023-02-21 15:02  0%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-20 23:35 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v8 00/22] Convert static logtypes in libraries
                     ` (3 preceding siblings ...)
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-20 23:35  3% ` Stephen Hemminger
  2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-20 23:35 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
	- leave them there
	- mark the definitions as deprecated
	- remove them
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v8 - rebase and fix CI issues on Arm
     simplify the mempool logtype patch

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 383 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  @ 2023-02-20 13:50  3%   ` Jerin Jacob
  2023-02-24  6:31  0%     ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-20 13:50 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
>
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
>
> +static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;

This will break the multiprocess.

> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"

Why need strings?
Also, every symbol in a public header file should start with RTE_ to
avoid namespace conflict.

> +       RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +       RTE_GRAPH_MODEL_RTC,

Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum itself.

> +#define WORKER_MODEL_GENERIC "generic"

Generic is a very overloaded term. Use pipeline here i.e
RTE_GRAPH_MODEL_PIPELINE


> +       RTE_GRAPH_MODEL_GENERIC,
> +       RTE_GRAPH_MODEL_MAX,

No need for MAX, it will break the ABI for future. See other subsystem
such as cryptodev.

> +};

>

^ permalink raw reply	[relevance 3%]

* [PATCH v2 3/3] doc: add Corigine information to nfp documentation
  @ 2023-02-20  8:41  8%   ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-20  8:41 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Walter Heymans, Chaoyong He

From: Walter Heymans <walter.heymans@corigine.com>

Add Corigine information to the nfp documentation. The Network Flow
Processor (NFP) PMD is used by products from both Netronome and
Corigine.

Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
---
 doc/guides/nics/nfp.rst | 78 +++++++++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index d133b6385c..f102238a28 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -1,19 +1,18 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
-    All rights reserved.
+    Copyright(c) 2021 Corigine, Inc. All rights reserved.
 
 NFP poll mode driver library
 ============================
 
-Netronome's sixth generation of flow processors pack 216 programmable
-cores and over 100 hardware accelerators that uniquely combine packet,
-flow, security and content processing in a single device that scales
+Netronome and Corigine's sixth generation of flow processors pack 216
+programmable cores and over 100 hardware accelerators that uniquely combine
+packet, flow, security and content processing in a single device that scales
 up to 400-Gb/s.
 
-This document explains how to use DPDK with the Netronome Poll Mode
-Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
-(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
-Netronome's Network Flow Processor 38xx (NFP-38xx).
+This document explains how to use DPDK with the Network Flow Processor (NFP)
+Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
+and NFP-38xx product lines.
 
 NFP is a SR-IOV capable device and the PMD supports the physical
 function (PF) and the virtual functions (VFs).
@@ -21,15 +20,16 @@ function (PF) and the virtual functions (VFs).
 Dependencies
 ------------
 
-Before using the Netronome's DPDK PMD some NFP configuration,
+Before using the NFP DPDK PMD some NFP configuration,
 which is not related to DPDK, is required. The system requires
-installation of **Netronome's BSP (Board Support Package)** along
-with a specific NFP firmware application. Netronome's NSP ABI
+installation of the **nfp-bsp (Board Support Package)** along
+with a specific NFP firmware application. The NSP ABI
 version should be 0.20 or higher.
 
-If you have a NFP device you should already have the code and
-documentation for this configuration. Contact
-**support@netronome.com** to obtain the latest available firmware.
+If you have a NFP device you should already have the documentation to perform
+this configuration. Contact **support@netronome.com** (for Netronome products)
+or **smartnic-support@corigine.com** (for Corigine products) to obtain the
+latest available firmware.
 
 The NFP Linux netdev kernel driver for VFs has been a part of the
 vanilla kernel since kernel version 4.5, and support for the PF
@@ -44,9 +44,9 @@ Linux kernel driver.
 Building the software
 ---------------------
 
-Netronome's PMD code is provided in the **drivers/net/nfp** directory.
-Although NFP PMD has Netronome´s BSP dependencies, it is possible to
-compile it along with other DPDK PMDs even if no BSP was installed previously.
+The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
+NFP PMD has BSP dependencies, it is possible to compile it along with other
+DPDK PMDs even if no BSP was installed previously.
 Of course, a DPDK app will require such a BSP installed for using the
 NFP PMD, along with a specific NFP firmware application.
 
@@ -68,9 +68,9 @@ like uploading the firmware and configure the Link state properly when starting
 or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
 a PF is initialized, which was not always true with older DPDK versions.
 
-Depending on the Netronome product installed in the system, firmware files
-should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
-PF looks for a firmware file in this order:
+Depending on the product installed in the system, firmware files should be
+available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
+for a firmware file in this order:
 
 	1) First try to find a firmware image specific for this device using the
 	   NFP serial number:
@@ -85,19 +85,22 @@ PF looks for a firmware file in this order:
 
 		nic_AMDA0099-0001_2x25.nffw
 
-Netronome's software packages install firmware files under
-``/lib/firmware/netronome`` to support all the Netronome's SmartNICs and
-different firmware applications. This is usually done using file names based on
-SmartNIC type and media and with a directory per firmware application. Options
-1 and 2 for firmware filenames allow more than one SmartNIC, same type of
-SmartNIC or different ones, and to upload a different firmware to each
+Netronome and Corigine's software packages install firmware files under
+``/lib/firmware/netronome`` to support all the Netronome and Corigine SmartNICs
+and different firmware applications. This is usually done using file names
+based on SmartNIC type and media and with a directory per firmware application.
+Options 1 and 2 for firmware filenames allow more than one SmartNIC, same type
+of SmartNIC or different ones, and to upload a different firmware to each
 SmartNIC.
 
    .. Note::
       Currently the NFP PMD supports using the PF with Agilio Firmware with
       NFD3 and Agilio Firmware with NFDk. See
-      https://help.netronome.com/support/solutions for more information on the
-      various firmwares supported by the Netronome Agilio CX smartNIC.
+      `Netronome Support <https://help.netronome.com/support/solutions>`_.
+      for more information on the various firmwares supported by the Netronome
+      Agilio SmartNIC range, or
+      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
+      for more information about Corigine's range.
 
 PF multiport support
 --------------------
@@ -164,6 +167,12 @@ System configuration
 
       lspci -d 19ee:
 
+   and on Corigine SmartNICs using:
+
+   .. code-block:: console
+
+      lspci -d 1da8:
+
    Now, for example, to configure two virtual functions on a NFP device
    whose PCI system identity is "0000:03:00.0":
 
@@ -171,12 +180,19 @@ System configuration
 
       echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
 
-   The result of this command may be shown using lspci again:
+   The result of this command may be shown using lspci again on Netronome
+   SmartNICs:
 
    .. code-block:: console
 
       lspci -kd 19ee:
 
+   and on Corigine SmartNICs:
+
+   .. code-block:: console
+
+      lspci -kd 1da8:
+
    Two new PCI devices should appear in the output of the above command. The
    -k option shows the device driver, if any, that the devices are bound to.
    Depending on the modules loaded, at this point the new PCI devices may be
@@ -186,8 +202,8 @@ System configuration
 Flow offload
 ------------
 
-Use the flower firmware application, some type of Netronome's SmartNICs can
-offload the flow into cards.
+Using the flower firmware application, some types of Netronome or Corigine
+SmartNICs can offload the flows onto the cards.
 
 The flower firmware application requires the PMD running two services:
 
-- 
2.29.3


^ permalink raw reply	[relevance 8%]

* Re: [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-16 11:09  3%   ` [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
@ 2023-02-16 11:42  0%     ` fengchengwen
  0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2023-02-16 11:42 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Kevin Laatz

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2023/2/16 19:09, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
> 
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Kevin Laatz <kevin.laatz@intel.com>

...

^ permalink raw reply	[relevance 0%]

* [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev
  @ 2023-02-16 11:09  3%   ` Bruce Richardson
  2023-02-16 11:42  0%     ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-16 11:09 UTC (permalink / raw)
  To: dev; +Cc: fengchengwen, Bruce Richardson, Kevin Laatz

Validate device operation when a device is stopped or restarted.

The only complication - and gap in the dmadev ABI specification - is
what happens to the job ids on restart. Some drivers reset them to 0,
while others continue where things left off. Take account of both
possibilities in the test case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
---
 app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
index 0296c52d2a..0736ff2a18 100644
--- a/app/test/test_dmadev.c
+++ b/app/test/test_dmadev.c
@@ -304,6 +304,48 @@ test_enqueue_copies(int16_t dev_id, uint16_t vchan)
 			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
 }
 
+static int
+test_stop_start(int16_t dev_id, uint16_t vchan)
+{
+	/* device is already started on input, should be (re)started on output */
+
+	uint16_t id = 0;
+	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
+
+	/* - test stopping a device works ok,
+	 * - then do a start-stop without doing a copy
+	 * - finally restart the device
+	 * checking for errors at each stage, and validating we can still copy at the end.
+	 */
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device\n");
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device after multiple stop-starts\n");
+
+	/* before doing a copy, we need to know what the next id will be it should
+	 * either be:
+	 * - the last completed job before start if driver does not reset id on stop
+	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
+	 */
+	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
+		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
+	id += 1; /* id_count is next job id */
+	if (id != id_count && id != 0)
+		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
+				id, id_count);
+
+	id_count = id;
+	if (test_single_copy(dev_id, vchan) < 0)
+		ERR_RETURN("Error performing copy after device restart\n");
+	return 0;
+}
+
 /* Failure handling test cases - global macros and variables for those tests*/
 #define COMP_BURST_SZ	16
 #define OPT_FENCE(idx) ((fence && idx == 8) ? RTE_DMA_OP_FLAG_FENCE : 0)
@@ -819,6 +861,10 @@ test_dmadev_instance(int16_t dev_id)
 	if (runtest("copy", test_enqueue_copies, 640, dev_id, vchan, CHECK_ERRS) < 0)
 		goto err;
 
+	/* run tests stopping/starting devices and check jobs still work after restart */
+	if (runtest("stop-start", test_stop_start, 1, dev_id, vchan, CHECK_ERRS) < 0)
+		goto err;
+
 	/* run some burst capacity tests */
 	if (rte_dma_burst_capacity(dev_id, vchan) < 64)
 		printf("DMA Dev %u: insufficient burst capacity (64 required), skipping tests\n",
-- 
2.37.2


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-16  1:24  0%         ` fengchengwen
@ 2023-02-16  9:24  0%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-02-16  9:24 UTC (permalink / raw)
  To: fengchengwen; +Cc: dev, Kevin Laatz

On Thu, Feb 16, 2023 at 09:24:38AM +0800, fengchengwen wrote:
> On 2023/2/15 19:57, Bruce Richardson wrote:
> > On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> >> On 2023/1/17 1:37, Bruce Richardson wrote:
> >>> Validate device operation when a device is stopped or restarted.
> >>>
> >>> The only complication - and gap in the dmadev ABI specification - is
> >>> what happens to the job ids on restart. Some drivers reset them to 0,
> >>> while others continue where things left off. Take account of both
> >>> possibilities in the test case.
> >>>
> >>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> >>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 46 insertions(+)
> >>>
> >>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> >>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> >>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> >>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> >>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >>>  
> >>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> >>> device is already started on input, should be (re)started on output */
> >>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> >>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> >>> ok, +	 * - then do a start-stop without doing a copy +	 *
> >>> - finally restart the device +	 * checking for errors at each
> >>> stage, and validating we can still copy at the end.  +	 */ +	if
> >>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> >>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> >>> 0) +		ERR_RETURN("Error stopping device after restart (no
> >>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> >>> /* before doing a copy, we need to know what the next id will be it
> >>> should +	 * either be: +	 * - the last completed job before start if
> >>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> >>> driver does reset the job ids on stop +	 */ +	if
> >>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> >>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> >>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> >>> 0) +		ERR_RETURN("Unexpected next id from device after
> >>> stop-start. Got %u, expected %u or 0\n", +				id,
> >>> id_count);
> >>
> >> Hi Bruce,
> >>
> >> Suggest add a warn LOG to identify the id was not reset zero.  So that
> >> new driver could detect break ABI specification.
> >>
> > Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> > doesn't specify what happens to the values on doing stop and then start. My
> > thinking was that it should continue numbering as it would be equivalent to
> > suspend and resume, but other drivers appear to treat it as a "reset". I
> > suspect there are advantages and disadvantages to both schemes. Until we
> > decide on what the correct behaviour should be - or decide to allow both -
> > I don't think warning is the right thing to do here.
> 
> In this point, agree to upstream this patch first, and then discuss the correct
> behavior should be for restart scenario.
> 
+1. Thanks.

With this patch in place we will also be better able to help drivers
enforce the correct behaviour once we define it.

I'll do v3 keeping this as-is for now.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-15 11:57  3%       ` Bruce Richardson
@ 2023-02-16  1:24  0%         ` fengchengwen
  2023-02-16  9:24  0%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-02-16  1:24 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Kevin Laatz

On 2023/2/15 19:57, Bruce Richardson wrote:
> On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
>> On 2023/1/17 1:37, Bruce Richardson wrote:
>>> Validate device operation when a device is stopped or restarted.
>>>
>>> The only complication - and gap in the dmadev ABI specification - is
>>> what happens to the job ids on restart. Some drivers reset them to 0,
>>> while others continue where things left off. Take account of both
>>> possibilities in the test case.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
>>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 46 insertions(+)
>>>
>>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
>>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
>>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
>>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
>>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
>>>  
>>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
>>> device is already started on input, should be (re)started on output */
>>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
>>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
>>> ok, +	 * - then do a start-stop without doing a copy +	 *
>>> - finally restart the device +	 * checking for errors at each
>>> stage, and validating we can still copy at the end.  +	 */ +	if
>>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
>>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
>>> 0) +		ERR_RETURN("Error stopping device after restart (no
>>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
>>> /* before doing a copy, we need to know what the next id will be it
>>> should +	 * either be: +	 * - the last completed job before start if
>>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
>>> driver does reset the job ids on stop +	 */ +	if
>>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
>>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
>>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
>>> 0) +		ERR_RETURN("Unexpected next id from device after
>>> stop-start. Got %u, expected %u or 0\n", +				id,
>>> id_count);
>>
>> Hi Bruce,
>>
>> Suggest add a warn LOG to identify the id was not reset zero.  So that
>> new driver could detect break ABI specification.
>>
> Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> doesn't specify what happens to the values on doing stop and then start. My
> thinking was that it should continue numbering as it would be equivalent to
> suspend and resume, but other drivers appear to treat it as a "reset". I
> suspect there are advantages and disadvantages to both schemes. Until we
> decide on what the correct behaviour should be - or decide to allow both -
> I don't think warning is the right thing to do here.

In this point, agree to upstream this patch first, and then discuss the correct
behavior should be for restart scenario.

> 
> /Bruce
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update NFP documentation with Corigine information
  2023-02-15 13:37  0% ` Ferruh Yigit
@ 2023-02-15 17:58  0%   ` Niklas Söderlund
  0 siblings, 0 replies; 200+ results
From: Niklas Söderlund @ 2023-02-15 17:58 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Chaoyong He, dev, oss-drivers, Walter Heymans

Hello Ferruh,

Thanks for your feedback.

On 2023-02-15 13:37:05 +0000, Ferruh Yigit wrote:
> On 2/3/2023 8:08 AM, Chaoyong He wrote:
> > From: Walter Heymans <walter.heymans@corigine.com>
> > 
> > The NFP PMD documentation is updated to include information about
> > Corigine and their new vendor device ID.
> > 
> > Outdated information regarding the use of the PMD is also updated.
> > 
> > While making major changes to the document, the maximum number of
> > characters per line is updated to 80 characters to improve the
> > readability in raw format.
> > 
> 
> There are three groups of changes done to documentation as explained in
> three paragraphs above.
> 
> To help review, is it possible to separate this patch into three
> patches? Later they can be squashed and merged as a single patch.
> But as it is, easy to miss content changes among formatting changes.
> 
> (You can include simple grammar updates (that doesn't change either
> content or Corigine related information) to formatting update patch)

We will break this patch in to three as you suggest, address the 
comments below and post a v2.

> 
> 
> > Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
> > Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
> > Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> > ---
> >  doc/guides/nics/nfp.rst | 168 +++++++++++++++++++++-------------------
> >  1 file changed, 90 insertions(+), 78 deletions(-)
> > 
> > diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
> > index a085d7d9ae..6fea280411 100644
> > --- a/doc/guides/nics/nfp.rst
> > +++ b/doc/guides/nics/nfp.rst
> > @@ -1,35 +1,34 @@
> >  ..  SPDX-License-Identifier: BSD-3-Clause
> >      Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
> > -    All rights reserved.
> > +    Copyright(c) 2021 Corigine, Inc. All rights reserved.
> >  
> >  NFP poll mode driver library
> >  ============================
> >  
> > -Netronome's sixth generation of flow processors pack 216 programmable
> > -cores and over 100 hardware accelerators that uniquely combine packet,
> > -flow, security and content processing in a single device that scales
> > +Netronome and Corigine's sixth generation of flow processors pack 216
> > +programmable cores and over 100 hardware accelerators that uniquely combine
> > +packet, flow, security and content processing in a single device that scales
> >  up to 400-Gb/s.
> >  
> > -This document explains how to use DPDK with the Netronome Poll Mode
> > -Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
> > -(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
> > -Netronome's Network Flow Processor 38xx (NFP-38xx).
> > +This document explains how to use DPDK with the Network Flow Processor (NFP)
> > +Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
> > +and NFP-38xx product lines.
> >  
> > -NFP is a SRIOV capable device and the PMD supports the physical
> > -function (PF) and the virtual functions (VFs).
> > +NFP is a SR-IOV capable device and the PMD supports the physical function (PF)
> > +and the virtual functions (VFs).
> >  
> >  Dependencies
> >  ------------
> >  
> > -Before using the Netronome's DPDK PMD some NFP configuration,
> > -which is not related to DPDK, is required. The system requires
> > -installation of **Netronome's BSP (Board Support Package)** along
> > -with a specific NFP firmware application. Netronome's NSP ABI
> > -version should be 0.20 or higher.
> > +Before using the NFP DPDK PMD some NFP configuration, which is not related to
> > +DPDK, is required. The system requires installation of
> > +**NFP-BSP (Board Support Package)** along with a specific NFP firmware
> > +application. The NSP ABI version should be 0.20 or higher.
> >  
> > -If you have a NFP device you should already have the code and
> > -documentation for this configuration. Contact
> > -**support@netronome.com** to obtain the latest available firmware.
> > +If you have a NFP device you should already have the documentation to perform
> > +this configuration. Contact **support@netronome.com** (for Netronome products)
> > +or **smartnic-support@corigine.com** (for Corigine products) to obtain the
> > +latest available firmware.
> >  
> >  The NFP Linux netdev kernel driver for VFs has been a part of the
> >  vanilla kernel since kernel version 4.5, and support for the PF
> > @@ -44,11 +43,11 @@ Linux kernel driver.
> >  Building the software
> >  ---------------------
> >  
> > -Netronome's PMD code is provided in the **drivers/net/nfp** directory.
> > -Although NFP PMD has Netronome´s BSP dependencies, it is possible to
> > -compile it along with other DPDK PMDs even if no BSP was installed previously.
> > -Of course, a DPDK app will require such a BSP installed for using the
> > -NFP PMD, along with a specific NFP firmware application.
> > +The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
> > +NFP PMD has BSP dependencies, it is possible to compile it along with other
> > +DPDK PMDs even if no BSP was installed previously. Of course, a DPDK app will
> > +require such a BSP installed for using the NFP PMD, along with a specific NFP
> > +firmware application.
> >  
> >  Once the DPDK is built all the DPDK apps and examples include support for
> >  the NFP PMD.
> > @@ -57,27 +56,20 @@ the NFP PMD.
> >  Driver compilation and testing
> >  ------------------------------
> >  
> > -Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
> > -for details.
> > +Refer to the document
> > +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` for details.
> >  
> >  Using the PF
> >  ------------
> >  
> > -NFP PMD supports using the NFP PF as another DPDK port, but it does not
> > -have any functionality for controlling VFs. In fact, it is not possible to use
> > -the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
> > -bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
> > -have a PMD able to work with the PF and VFs at the same time and with the PF
> > -implementing VF management along with other PF-only functionalities/offloads.
> > -
> 
> Why this paragraph is removed? Is it because it is not correct anymore,
> or just because of document organization change.
> 
> >  The PMD PF has extra work to do which will delay the DPDK app initialization
> > -like uploading the firmware and configure the Link state properly when starting or
> > -stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> > +like uploading the firmware and configure the Link state properly when starting
> > +or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> >  a PF is initialized, which was not always true with older DPDK versions.
> >  
> > -Depending on the Netronome product installed in the system, firmware files
> > -should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
> > -PF looks for a firmware file in this order:
> > +Depending on the product installed in the system, firmware files should be
> > +available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
> > +for a firmware file in this order:
> >  
> >  	1) First try to find a firmware image specific for this device using the
> >  	   NFP serial number:
> > @@ -92,18 +84,21 @@ PF looks for a firmware file in this order:
> >  
> >  		nic_AMDA0099-0001_2x25.nffw
> >  
> > -Netronome's software packages install firmware files under ``/lib/firmware/netronome``
> > -to support all the Netronome's SmartNICs and different firmware applications.
> > -This is usually done using file names based on SmartNIC type and media and with a
> > -directory per firmware application. Options 1 and 2 for firmware filenames allow
> > -more than one SmartNIC, same type of SmartNIC or different ones, and to upload a
> > -different firmware to each SmartNIC.
> > +Netronome and Corigine's software packages install firmware files under
> > +``/lib/firmware/netronome`` to support all the SmartNICs and different firmware
> > +applications. This is usually done using file names based on SmartNIC type and
> > +media and with a directory per firmware application. Options 1 and 2 for
> > +firmware filenames allow more than one SmartNIC, same type of SmartNIC or
> > +different ones, and to upload a different firmware to each SmartNIC.
> >  
> >     .. Note::
> > -      Currently the NFP PMD supports using the PF with Agilio Firmware with NFD3
> > -      and Agilio Firmware with NFDk. See https://help.netronome.com/support/solutions
> > +      Currently the NFP PMD supports using the PF with Agilio Firmware with
> > +      NFD3 and Agilio Firmware with NFDk. See
> > +      `Netronome Support <https://help.netronome.com/support/solutions>`_.
> >        for more information on the various firmwares supported by the Netronome
> > -      Agilio CX smartNIC.
> > +      Agilio SmartNICs range, or
> > +      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
> > +      for more information about Corigine's range.
> >  
> >  PF multiport support
> >  --------------------
> > @@ -118,7 +113,7 @@ this particular configuration requires the PMD to create ports in a special way,
> >  although once they are created, DPDK apps should be able to use them as normal
> >  PCI ports.
> >  
> > -NFP ports belonging to same PF can be seen inside PMD initialization with a
> > +NFP ports belonging to the same PF can be seen inside PMD initialization with a
> >  suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
> >  0000:03:00.0 and four ports is seen by the PMD code as:
> >  
> > @@ -137,50 +132,67 @@ suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
> >  PF multiprocess support
> >  -----------------------
> >  
> > -Due to how the driver needs to access the NFP through a CPP interface, which implies
> > -to use specific registers inside the chip, the number of secondary processes with PF
> > -ports is limited to only one.
> > +Due to how the driver needs to access the NFP through a CPP interface, which
> > +implies to use specific registers inside the chip, the number of secondary
> > +processes with PF ports is limited to only one.
> >  
> > -This limitation will be solved in future versions but having basic multiprocess support
> > -is important for allowing development and debugging through the PF using a secondary
> > -process which will create a CPP bridge for user space tools accessing the NFP.
> > +This limitation will be solved in future versions, but having basic
> > +multiprocess support is important for allowing development and debugging
> > +through the PF using a secondary process, which will create a CPP bridge
> > +for user space tools accessing the NFP.
> >  
> >  
> >  System configuration
> >  --------------------
> >  
> >  #. **Enable SR-IOV on the NFP device:** The current NFP PMD supports the PF and
> > -   the VFs on a NFP device. However, it is not possible to work with both at the
> > -   same time because the VFs require the PF being bound to the NFP PF Linux
> > -   netdev driver.  Make sure you are working with a kernel with NFP PF support or
> > -   get the drivers from the above Github repository and follow the instructions
> > -   for building and installing it.
> > +   the VFs on a NFP device. However, it is not possible to work with both at
> > +   the same time when using the netdev NFP Linux netdev driver.
> 
> Old and new text doesn't say same thing.
> Old one says: "For DPDK to support VF, PF needs to bound to kernel driver.:
> 
> Is this changed, or just wording mistake?
> 
> 
> >     It is possible
> > +   to bind the PF to the ``vfio-pci`` kernel module, and create VFs afterwards.
> > +   This requires loading the ``vfio-pci`` module with the following parameters:
> > +
> > +   .. code-block:: console
> > +
> > +      modprobe vfio-pci enable_sriov=1 disable_idle_d3=1
> > +
> > +   VFs need to be enabled before they can be used with the PMD. Before enabling
> > +   the VFs it is useful to obtain information about the current NFP PCI device
> > +   detected by the system. This can be done on Netronome SmartNICs using:
> > +
> > +   .. code-block:: console
> > +
> > +      lspci -d 19ee:
> >  
> 
> What I understand is, to support VF by DPDK two things are required:
> 1) Ability to create VFs, this can be done both by using device's kernel
> driver or 'vfio-pci'
> 2) PF driver should support managing VFs.
> 
> Above lines document about item (1) and how 'vfio-pci' is used for it.
> 
> But old documentation mentions about item (2) is missing, why that part
> removed, isn't it valid anymore? I mean is "PF -> kernel, VF -> DPDK"
> combination supported now?
> 
> 
> > -   VFs need to be enabled before they can be used with the PMD.
> > -   Before enabling the VFs it is useful to obtain information about the
> > -   current NFP PCI device detected by the system:
> > +   and on Corigine SmartNICs using:
> >  
> >     .. code-block:: console
> >  
> > -      lspci -d19ee:
> > +      lspci -d 1da8:
> >  
> > -   Now, for example, configure two virtual functions on a NFP-6xxx device
> > +   Now, for example, to configure two virtual functions on a NFP device
> >     whose PCI system identity is "0000:03:00.0":
> >  
> >     .. code-block:: console
> >  
> >        echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
> >  
> > -   The result of this command may be shown using lspci again:
> > +   The result of this command may be shown using lspci again on Netronome
> > +   SmartNICs:
> > +
> > +   .. code-block:: console
> > +
> > +      lspci -d 19ee: -k
> > +
> > +   and on Corigine SmartNICs:
> >  
> >     .. code-block:: console
> >  
> > -      lspci -d19ee: -k
> > +      lspci -d 1da8: -k
> >  
> >     Two new PCI devices should appear in the output of the above command. The
> > -   -k option shows the device driver, if any, that devices are bound to.
> > -   Depending on the modules loaded at this point the new PCI devices may be
> > -   bound to nfp_netvf driver.
> > +   -k option shows the device driver, if any, that the devices are bound to.
> > +   Depending on the modules loaded, at this point the new PCI devices may be
> > +   bound to the ``nfp`` kernel driver or ``vfio-pci``.
> >  
> >  
> >  Flow offload
> > @@ -193,13 +205,13 @@ The flower firmware application requires the PMD running two services:
> >  
> >  	* PF vNIC service: handling the feedback traffic.
> >  	* ctrl vNIC service: communicate between PMD and firmware through
> > -	  control message.
> > +	  control messages.
> >  
> >  To achieve the offload of flow, the representor ports are exposed to OVS.
> > -The flower firmware application support representor port for VF and physical
> > -port. There will always exist a representor port for each physical port,
> > -and the number of the representor port for VF is specified by the user through
> > -parameter.
> > +The flower firmware application supports VF, PF, and physical port representor
> > +ports. 
> 
> Again old document and new one is not saying same thing, is it intentional?
> 
> Old one says: "Having representor ports for both VF and PF is supported."
> 
> New one says: "FW supports representor port, VF and PF."
> 
> > There will always exist a representor port for a PF and each physical
> > +port. The number of the representor ports for VFs are specified by the user
> > +through a parameter.
> >  
> >  In the Rx direction, the flower firmware application will prepend the input
> >  port information into metadata for each packet which can't offloaded. The PF
> > @@ -207,12 +219,12 @@ vNIC service will keep polling packets from the firmware, and multiplex them
> >  to the corresponding representor port.
> >  
> >  In the Tx direction, the representor port will prepend the output port
> > -information into metadata for each packet, and then send it to firmware through
> > -PF vNIC.
> > +information into metadata for each packet, and then send it to the firmware
> > +through the PF vNIC.
> >  
> > -The ctrl vNIC service handling various control message, like the creation and
> > -configuration of representor port, the pattern and action of flow rules, the
> > -statistics of flow rules, and so on.
> > +The ctrl vNIC service handles various control messages, for example, the
> > +creation and configuration of a representor port, the pattern and action of
> > +flow rules, the statistics of flow rules, etc.
> >  
> >  Metadata Format
> >  ---------------
> 

-- 
Kind Regards,
Niklas Söderlund

^ permalink raw reply	[relevance 0%]

* [PATCH v7 21/22] hash: move rte_hash_set_alg out header
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-15 17:23  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v7 00/22] Replace use of static logtypes in libraries
                     ` (2 preceding siblings ...)
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-15 17:23  3% ` Stephen Hemminger
  2023-02-15 17:23  3%   ` [PATCH v7 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v7 - fix commit message typ
     add error to gso_segment function doc
     fix missing cpuflags.h on arm

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 74 files changed, 378 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
  2023-02-15  7:26  3%     ` Hu, Jiayu
@ 2023-02-15 17:12  0%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:12 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Konstantin Ananyev, Mark Kavanagh

On Wed, 15 Feb 2023 07:26:22 +0000
"Hu, Jiayu" <jiayu.hu@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Wednesday, February 15, 2023 6:47 AM
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger <stephen@networkplumber.org>; Hu, Jiayu
> > <jiayu.hu@intel.com>; Konstantin Ananyev
> > <konstantin.v.ananyev@yandex.ru>; Mark Kavanagh
> > <mark.b.kavanagh@intel.com>
> > Subject: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
> > 
> > If a large packet is passed into GSO routines of unknown protocol then library
> > would log a message.
> > Better to tell the application instead of logging.
> > 
> > Fixes: 119583797b6a ("gso: support TCP/IPv4 GSO")
> > Cc: jiayu.hu@intel.com
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/gso/rte_gso.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/gso/rte_gso.c b/lib/gso/rte_gso.c index
> > 4b59217c16ee..c8e67c2d4b48 100644
> > --- a/lib/gso/rte_gso.c
> > +++ b/lib/gso/rte_gso.c
> > @@ -80,9 +80,8 @@ rte_gso_segment(struct rte_mbuf *pkt,
> >  		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
> >  				indirect_pool, pkts_out, nb_pkts_out);
> >  	} else {
> > -		/* unsupported packet, skip */
> > -		RTE_LOG(DEBUG, GSO, "Unsupported packet type\n");
> > -		ret = 0;
> > +		ret = -ENOTSUP;	/* only UDP or TCP allowed */
> > +  
> 
> The function signature annotation in rte_gso.h also needs update for ENOTSUP.
> In addition, will it break ABI? 

Not really, if anybody hits this error case, nothing good would have
been happening.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update NFP documentation with Corigine information
  2023-02-03  8:08  6% [PATCH] doc: update NFP documentation with Corigine information Chaoyong He
@ 2023-02-15 13:37  0% ` Ferruh Yigit
  2023-02-15 17:58  0%   ` Niklas Söderlund
    1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-15 13:37 UTC (permalink / raw)
  To: Chaoyong He, dev; +Cc: oss-drivers, niklas.soderlund, Walter Heymans

On 2/3/2023 8:08 AM, Chaoyong He wrote:
> From: Walter Heymans <walter.heymans@corigine.com>
> 
> The NFP PMD documentation is updated to include information about
> Corigine and their new vendor device ID.
> 
> Outdated information regarding the use of the PMD is also updated.
> 
> While making major changes to the document, the maximum number of
> characters per line is updated to 80 characters to improve the
> readability in raw format.
> 

There are three groups of changes done to documentation as explained in
three paragraphs above.

To help review, is it possible to separate this patch into three
patches? Later they can be squashed and merged as a single patch.
But as it is, easy to miss content changes among formatting changes.

(You can include simple grammar updates (that doesn't change either
content or Corigine related information) to formatting update patch)


> Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> ---
>  doc/guides/nics/nfp.rst | 168 +++++++++++++++++++++-------------------
>  1 file changed, 90 insertions(+), 78 deletions(-)
> 
> diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
> index a085d7d9ae..6fea280411 100644
> --- a/doc/guides/nics/nfp.rst
> +++ b/doc/guides/nics/nfp.rst
> @@ -1,35 +1,34 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>      Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
> -    All rights reserved.
> +    Copyright(c) 2021 Corigine, Inc. All rights reserved.
>  
>  NFP poll mode driver library
>  ============================
>  
> -Netronome's sixth generation of flow processors pack 216 programmable
> -cores and over 100 hardware accelerators that uniquely combine packet,
> -flow, security and content processing in a single device that scales
> +Netronome and Corigine's sixth generation of flow processors pack 216
> +programmable cores and over 100 hardware accelerators that uniquely combine
> +packet, flow, security and content processing in a single device that scales
>  up to 400-Gb/s.
>  
> -This document explains how to use DPDK with the Netronome Poll Mode
> -Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
> -(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
> -Netronome's Network Flow Processor 38xx (NFP-38xx).
> +This document explains how to use DPDK with the Network Flow Processor (NFP)
> +Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
> +and NFP-38xx product lines.
>  
> -NFP is a SRIOV capable device and the PMD supports the physical
> -function (PF) and the virtual functions (VFs).
> +NFP is a SR-IOV capable device and the PMD supports the physical function (PF)
> +and the virtual functions (VFs).
>  
>  Dependencies
>  ------------
>  
> -Before using the Netronome's DPDK PMD some NFP configuration,
> -which is not related to DPDK, is required. The system requires
> -installation of **Netronome's BSP (Board Support Package)** along
> -with a specific NFP firmware application. Netronome's NSP ABI
> -version should be 0.20 or higher.
> +Before using the NFP DPDK PMD some NFP configuration, which is not related to
> +DPDK, is required. The system requires installation of
> +**NFP-BSP (Board Support Package)** along with a specific NFP firmware
> +application. The NSP ABI version should be 0.20 or higher.
>  
> -If you have a NFP device you should already have the code and
> -documentation for this configuration. Contact
> -**support@netronome.com** to obtain the latest available firmware.
> +If you have a NFP device you should already have the documentation to perform
> +this configuration. Contact **support@netronome.com** (for Netronome products)
> +or **smartnic-support@corigine.com** (for Corigine products) to obtain the
> +latest available firmware.
>  
>  The NFP Linux netdev kernel driver for VFs has been a part of the
>  vanilla kernel since kernel version 4.5, and support for the PF
> @@ -44,11 +43,11 @@ Linux kernel driver.
>  Building the software
>  ---------------------
>  
> -Netronome's PMD code is provided in the **drivers/net/nfp** directory.
> -Although NFP PMD has Netronome´s BSP dependencies, it is possible to
> -compile it along with other DPDK PMDs even if no BSP was installed previously.
> -Of course, a DPDK app will require such a BSP installed for using the
> -NFP PMD, along with a specific NFP firmware application.
> +The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
> +NFP PMD has BSP dependencies, it is possible to compile it along with other
> +DPDK PMDs even if no BSP was installed previously. Of course, a DPDK app will
> +require such a BSP installed for using the NFP PMD, along with a specific NFP
> +firmware application.
>  
>  Once the DPDK is built all the DPDK apps and examples include support for
>  the NFP PMD.
> @@ -57,27 +56,20 @@ the NFP PMD.
>  Driver compilation and testing
>  ------------------------------
>  
> -Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
> -for details.
> +Refer to the document
> +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` for details.
>  
>  Using the PF
>  ------------
>  
> -NFP PMD supports using the NFP PF as another DPDK port, but it does not
> -have any functionality for controlling VFs. In fact, it is not possible to use
> -the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
> -bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
> -have a PMD able to work with the PF and VFs at the same time and with the PF
> -implementing VF management along with other PF-only functionalities/offloads.
> -

Why this paragraph is removed? Is it because it is not correct anymore,
or just because of document organization change.

>  The PMD PF has extra work to do which will delay the DPDK app initialization
> -like uploading the firmware and configure the Link state properly when starting or
> -stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> +like uploading the firmware and configure the Link state properly when starting
> +or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
>  a PF is initialized, which was not always true with older DPDK versions.
>  
> -Depending on the Netronome product installed in the system, firmware files
> -should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
> -PF looks for a firmware file in this order:
> +Depending on the product installed in the system, firmware files should be
> +available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
> +for a firmware file in this order:
>  
>  	1) First try to find a firmware image specific for this device using the
>  	   NFP serial number:
> @@ -92,18 +84,21 @@ PF looks for a firmware file in this order:
>  
>  		nic_AMDA0099-0001_2x25.nffw
>  
> -Netronome's software packages install firmware files under ``/lib/firmware/netronome``
> -to support all the Netronome's SmartNICs and different firmware applications.
> -This is usually done using file names based on SmartNIC type and media and with a
> -directory per firmware application. Options 1 and 2 for firmware filenames allow
> -more than one SmartNIC, same type of SmartNIC or different ones, and to upload a
> -different firmware to each SmartNIC.
> +Netronome and Corigine's software packages install firmware files under
> +``/lib/firmware/netronome`` to support all the SmartNICs and different firmware
> +applications. This is usually done using file names based on SmartNIC type and
> +media and with a directory per firmware application. Options 1 and 2 for
> +firmware filenames allow more than one SmartNIC, same type of SmartNIC or
> +different ones, and to upload a different firmware to each SmartNIC.
>  
>     .. Note::
> -      Currently the NFP PMD supports using the PF with Agilio Firmware with NFD3
> -      and Agilio Firmware with NFDk. See https://help.netronome.com/support/solutions
> +      Currently the NFP PMD supports using the PF with Agilio Firmware with
> +      NFD3 and Agilio Firmware with NFDk. See
> +      `Netronome Support <https://help.netronome.com/support/solutions>`_.
>        for more information on the various firmwares supported by the Netronome
> -      Agilio CX smartNIC.
> +      Agilio SmartNICs range, or
> +      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
> +      for more information about Corigine's range.
>  
>  PF multiport support
>  --------------------
> @@ -118,7 +113,7 @@ this particular configuration requires the PMD to create ports in a special way,
>  although once they are created, DPDK apps should be able to use them as normal
>  PCI ports.
>  
> -NFP ports belonging to same PF can be seen inside PMD initialization with a
> +NFP ports belonging to the same PF can be seen inside PMD initialization with a
>  suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
>  0000:03:00.0 and four ports is seen by the PMD code as:
>  
> @@ -137,50 +132,67 @@ suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
>  PF multiprocess support
>  -----------------------
>  
> -Due to how the driver needs to access the NFP through a CPP interface, which implies
> -to use specific registers inside the chip, the number of secondary processes with PF
> -ports is limited to only one.
> +Due to how the driver needs to access the NFP through a CPP interface, which
> +implies to use specific registers inside the chip, the number of secondary
> +processes with PF ports is limited to only one.
>  
> -This limitation will be solved in future versions but having basic multiprocess support
> -is important for allowing development and debugging through the PF using a secondary
> -process which will create a CPP bridge for user space tools accessing the NFP.
> +This limitation will be solved in future versions, but having basic
> +multiprocess support is important for allowing development and debugging
> +through the PF using a secondary process, which will create a CPP bridge
> +for user space tools accessing the NFP.
>  
>  
>  System configuration
>  --------------------
>  
>  #. **Enable SR-IOV on the NFP device:** The current NFP PMD supports the PF and
> -   the VFs on a NFP device. However, it is not possible to work with both at the
> -   same time because the VFs require the PF being bound to the NFP PF Linux
> -   netdev driver.  Make sure you are working with a kernel with NFP PF support or
> -   get the drivers from the above Github repository and follow the instructions
> -   for building and installing it.
> +   the VFs on a NFP device. However, it is not possible to work with both at
> +   the same time when using the netdev NFP Linux netdev driver.

Old and new text doesn't say same thing.
Old one says: "For DPDK to support VF, PF needs to bound to kernel driver.:

Is this changed, or just wording mistake?


>     It is possible
> +   to bind the PF to the ``vfio-pci`` kernel module, and create VFs afterwards.
> +   This requires loading the ``vfio-pci`` module with the following parameters:
> +
> +   .. code-block:: console
> +
> +      modprobe vfio-pci enable_sriov=1 disable_idle_d3=1
> +
> +   VFs need to be enabled before they can be used with the PMD. Before enabling
> +   the VFs it is useful to obtain information about the current NFP PCI device
> +   detected by the system. This can be done on Netronome SmartNICs using:
> +
> +   .. code-block:: console
> +
> +      lspci -d 19ee:
>  

What I understand is, to support VF by DPDK two things are required:
1) Ability to create VFs, this can be done both by using device's kernel
driver or 'vfio-pci'
2) PF driver should support managing VFs.

Above lines document about item (1) and how 'vfio-pci' is used for it.

But old documentation mentions about item (2) is missing, why that part
removed, isn't it valid anymore? I mean is "PF -> kernel, VF -> DPDK"
combination supported now?


> -   VFs need to be enabled before they can be used with the PMD.
> -   Before enabling the VFs it is useful to obtain information about the
> -   current NFP PCI device detected by the system:
> +   and on Corigine SmartNICs using:
>  
>     .. code-block:: console
>  
> -      lspci -d19ee:
> +      lspci -d 1da8:
>  
> -   Now, for example, configure two virtual functions on a NFP-6xxx device
> +   Now, for example, to configure two virtual functions on a NFP device
>     whose PCI system identity is "0000:03:00.0":
>  
>     .. code-block:: console
>  
>        echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
>  
> -   The result of this command may be shown using lspci again:
> +   The result of this command may be shown using lspci again on Netronome
> +   SmartNICs:
> +
> +   .. code-block:: console
> +
> +      lspci -d 19ee: -k
> +
> +   and on Corigine SmartNICs:
>  
>     .. code-block:: console
>  
> -      lspci -d19ee: -k
> +      lspci -d 1da8: -k
>  
>     Two new PCI devices should appear in the output of the above command. The
> -   -k option shows the device driver, if any, that devices are bound to.
> -   Depending on the modules loaded at this point the new PCI devices may be
> -   bound to nfp_netvf driver.
> +   -k option shows the device driver, if any, that the devices are bound to.
> +   Depending on the modules loaded, at this point the new PCI devices may be
> +   bound to the ``nfp`` kernel driver or ``vfio-pci``.
>  
>  
>  Flow offload
> @@ -193,13 +205,13 @@ The flower firmware application requires the PMD running two services:
>  
>  	* PF vNIC service: handling the feedback traffic.
>  	* ctrl vNIC service: communicate between PMD and firmware through
> -	  control message.
> +	  control messages.
>  
>  To achieve the offload of flow, the representor ports are exposed to OVS.
> -The flower firmware application support representor port for VF and physical
> -port. There will always exist a representor port for each physical port,
> -and the number of the representor port for VF is specified by the user through
> -parameter.
> +The flower firmware application supports VF, PF, and physical port representor
> +ports. 

Again old document and new one is not saying same thing, is it intentional?

Old one says: "Having representor ports for both VF and PF is supported."

New one says: "FW supports representor port, VF and PF."

> There will always exist a representor port for a PF and each physical
> +port. The number of the representor ports for VFs are specified by the user
> +through a parameter.
>  
>  In the Rx direction, the flower firmware application will prepend the input
>  port information into metadata for each packet which can't offloaded. The PF
> @@ -207,12 +219,12 @@ vNIC service will keep polling packets from the firmware, and multiplex them
>  to the corresponding representor port.
>  
>  In the Tx direction, the representor port will prepend the output port
> -information into metadata for each packet, and then send it to firmware through
> -PF vNIC.
> +information into metadata for each packet, and then send it to the firmware
> +through the PF vNIC.
>  
> -The ctrl vNIC service handling various control message, like the creation and
> -configuration of representor port, the pattern and action of flow rules, the
> -statistics of flow rules, and so on.
> +The ctrl vNIC service handles various control messages, for example, the
> +creation and configuration of a representor port, the pattern and action of
> +flow rules, the statistics of flow rules, etc.
>  
>  Metadata Format
>  ---------------


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-15  1:59  3%     ` fengchengwen
@ 2023-02-15 11:57  3%       ` Bruce Richardson
  2023-02-16  1:24  0%         ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-15 11:57 UTC (permalink / raw)
  To: fengchengwen; +Cc: dev, Kevin Laatz

On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> On 2023/1/17 1:37, Bruce Richardson wrote:
> > Validate device operation when a device is stopped or restarted.
> > 
> > The only complication - and gap in the dmadev ABI specification - is
> > what happens to the job ids on restart. Some drivers reset them to 0,
> > while others continue where things left off. Take account of both
> > possibilities in the test case.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 46 insertions(+)
> > 
> > diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> > de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> > b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> > test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> > do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >  
> > +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> > device is already started on input, should be (re)started on output */
> > + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> > RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> > ok, +	 * - then do a start-stop without doing a copy +	 *
> > - finally restart the device +	 * checking for errors at each
> > stage, and validating we can still copy at the end.  +	 */ +	if
> > (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> > device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> > 0) +		ERR_RETURN("Error stopping device after restart (no
> > jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> > /* before doing a copy, we need to know what the next id will be it
> > should +	 * either be: +	 * - the last completed job before start if
> > driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> > driver does reset the job ids on stop +	 */ +	if
> > (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> > ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> > id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> > 0) +		ERR_RETURN("Unexpected next id from device after
> > stop-start. Got %u, expected %u or 0\n", +				id,
> > id_count);
> 
> Hi Bruce,
> 
> Suggest add a warn LOG to identify the id was not reset zero.  So that
> new driver could detect break ABI specification.
> 
Not sure that that is necessary. The actual ABI, nor the doxygen docs,
doesn't specify what happens to the values on doing stop and then start. My
thinking was that it should continue numbering as it would be equivalent to
suspend and resume, but other drivers appear to treat it as a "reset". I
suspect there are advantages and disadvantages to both schemes. Until we
decide on what the correct behaviour should be - or decide to allow both -
I don't think warning is the right thing to do here.

/Bruce

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
  @ 2023-02-15  7:26  3%     ` Hu, Jiayu
  2023-02-15 17:12  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Hu, Jiayu @ 2023-02-15  7:26 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Konstantin Ananyev, Mark Kavanagh



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, February 15, 2023 6:47 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Hu, Jiayu
> <jiayu.hu@intel.com>; Konstantin Ananyev
> <konstantin.v.ananyev@yandex.ru>; Mark Kavanagh
> <mark.b.kavanagh@intel.com>
> Subject: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
> 
> If a large packet is passed into GSO routines of unknown protocol then library
> would log a message.
> Better to tell the application instead of logging.
> 
> Fixes: 119583797b6a ("gso: support TCP/IPv4 GSO")
> Cc: jiayu.hu@intel.com
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/gso/rte_gso.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/gso/rte_gso.c b/lib/gso/rte_gso.c index
> 4b59217c16ee..c8e67c2d4b48 100644
> --- a/lib/gso/rte_gso.c
> +++ b/lib/gso/rte_gso.c
> @@ -80,9 +80,8 @@ rte_gso_segment(struct rte_mbuf *pkt,
>  		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
>  				indirect_pool, pkts_out, nb_pkts_out);
>  	} else {
> -		/* unsupported packet, skip */
> -		RTE_LOG(DEBUG, GSO, "Unsupported packet type\n");
> -		ret = 0;
> +		ret = -ENOTSUP;	/* only UDP or TCP allowed */
> +

The function signature annotation in rte_gso.h also needs update for ENOTSUP.
In addition, will it break ABI? 

Thanks,
Jiayu
>  	}
> 
>  	if (ret < 0) {
> --
> 2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
    2023-02-14 16:04  0%     ` Kevin Laatz
@ 2023-02-15  1:59  3%     ` fengchengwen
  2023-02-15 11:57  3%       ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: fengchengwen @ 2023-02-15  1:59 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Kevin Laatz

On 2023/1/17 1:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
> 
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
> index de787c14e2..8fb73a41e2 100644
> --- a/app/test/test_dmadev.c
> +++ b/app/test/test_dmadev.c
> @@ -304,6 +304,48 @@ test_enqueue_copies(int16_t dev_id, uint16_t vchan)
>  			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
>  }
>  
> +static int
> +test_stop_start(int16_t dev_id, uint16_t vchan)
> +{
> +	/* device is already started on input, should be (re)started on output */
> +
> +	uint16_t id = 0;
> +	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
> +
> +	/* - test stopping a device works ok,
> +	 * - then do a start-stop without doing a copy
> +	 * - finally restart the device
> +	 * checking for errors at each stage, and validating we can still copy at the end.
> +	 */
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device\n");
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device after multiple stop-starts\n");
> +
> +	/* before doing a copy, we need to know what the next id will be it should
> +	 * either be:
> +	 * - the last completed job before start if driver does not reset id on stop
> +	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
> +	 */
> +	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
> +		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
> +	id += 1; /* id_count is next job id */
> +	if (id != id_count && id != 0)
> +		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
> +				id, id_count);

Hi Bruce,

Suggest add a warn LOG to identify the id was not reset zero.
So that new driver could detect break ABI specification.

Thanks.


^ permalink raw reply	[relevance 3%]

* [PATCH v6 21/22] hash: move rte_hash_set_alg out header
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
  @ 2023-02-14 22:47  3%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14 22:47 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v6 00/22] Replace use of static logtypes in libraries
    2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
@ 2023-02-14 22:47  3% ` Stephen Hemminger
    2023-02-14 22:47  3%   ` [PATCH v6 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14 22:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v6 - fix typo in kni port 

v5 - fix use of LOGTYPE PORT and POWER in examples

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 375 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  @ 2023-02-14 16:04  0%     ` Kevin Laatz
  2023-02-15  1:59  3%     ` fengchengwen
  1 sibling, 0 replies; 200+ results
From: Kevin Laatz @ 2023-02-14 16:04 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Chengwen Feng

On 16/01/2023 17:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
>
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 46 insertions(+)
>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-14  9:38  0%         ` Jiawei(Jonny) Wang
@ 2023-02-14 10:01  0%           ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-14 10:01 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

On 2/14/2023 9:38 AM, Jiawei(Jonny) Wang wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, February 10, 2023 3:45 AM
>> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
>> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
>> Yuying Zhang <yuying.zhang@intel.com>
>> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
>> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
>> API
>>
>> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
>>> When multiple physical ports are connected to a single DPDK port,
>>> (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
>>> know which physical port is used for Rx and Tx.
>>>
>>
>> I assume "kernel bonding" is out of context, but this patch concerns DPDK
>> bonding, failsafe or softnic. (I will refer them as virtual bonding
>> device.)
>>
>> To use specific queues of the virtual bonding device may interfere with the
>> logic of these devices, like bonding modes or RSS of the underlying devices. I
>> can see feature focuses on a very specific use case, but not sure if all possible
>> side effects taken into consideration.
>>
>>
>> And although the feature is only relavent to virtual bondiong device, core
>> ethdev structures are updated for this. Most use cases won't need these, so is
>> there a way to reduce the scope of the changes to virtual bonding devices?
>>
>>
>> There are a few very core ethdev APIs, like:
>> rte_eth_dev_configure()
>> rte_eth_tx_queue_setup()
>> rte_eth_rx_queue_setup()
>> rte_eth_dev_start()
>> rte_eth_dev_info_get()
>>
>> Almost every user of ehtdev uses these APIs, since these are so fundemental I
>> am for being a little more conservative on these APIs.
>>
>> Every eccentric features are targetting these APIs first because they are
>> common and extending them gives an easy solution, but in long run making
>> these APIs more complex, harder to maintain and harder for PMDs to support
>> them correctly. So I am for not updating them unless it is a generic use case.
>>
>>
>> Also as we talked about PMDs supporting them, I assume your coming PMD
>> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
>> What will happen for other NICs? Will they silently ignore the config option
>> from user? So this is a problem for the DPDK application portabiltiy.
>>
>>
>>
>> As far as I understand target is application controlling which sub-device is used
>> under the virtual bonding device, can you pleaes give more information why
>> this is required, perhaps it can help to provide a better/different solution.
>> Like adding the ability to use both bonding device and sub-device for data path,
>> this way application can use whichever it wants. (this is just first solution I
>> come with, I am not suggesting as replacement solution, but if you can describe
>> the problem more I am sure other people can come with better solutions.)
>>
>> And isn't this against the applicatio transparent to underneath device being
>> bonding device or actual device?
>>
>>
> 
> OK, I will send the new version with separate functions in ethdev layer, 
> to support the Map a Tx queue to port and get the number of ports.
> And these functions work with device ops callback, other NICs will reported
> The unsupported the ops callback is NULL.
> 

OK, thanks Jonny, at least this separates the fetaure to its own APIs
which reduces the impact for applications and drivers that are not using
this feature.


>>> This patch maps a DPDK Tx queue with a physical port, by adding
>>> tx_phy_affinity setting in Tx queue.
>>> The affinity number is the physical port ID where packets will be
>>> sent.
>>> Value 0 means no affinity and traffic could be routed to any connected
>>> physical ports, this is the default current behavior.
>>>
>>> The number of physical ports is reported with rte_eth_dev_info_get().
>>>
>>> The new tx_phy_affinity field is added into the padding hole of
>>> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
>>> An ABI check rule needs to be added to avoid false warning.
>>>
>>> Add the testpmd command line:
>>> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
>>>
>>> For example, there're two physical ports connected to a single DPDK
>>> port (port id 0), and phy_affinity 1 stood for the first physical port
>>> and phy_affinity 2 stood for the second physical port.
>>> Use the below commands to config tx phy affinity for per Tx Queue:
>>>         port config 0 txq 0 phy_affinity 1
>>>         port config 0 txq 1 phy_affinity 1
>>>         port config 0 txq 2 phy_affinity 2
>>>         port config 0 txq 3 phy_affinity 2
>>>
>>> These commands config the Tx Queue index 0 and Tx Queue index 1 with
>>> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
>>> packets will be sent from the first physical port, and similar with
>>> the second physical port if sending packets with Tx Queue 2 or Tx
>>> Queue 3.
>>>
>>> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
>>> ---
>>>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>>>  app/test-pmd/config.c                       |   1 +
>>>  devtools/libabigail.abignore                |   5 +
>>>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>>>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>>>  lib/ethdev/rte_ethdev.h                     |  10 ++
>>>  6 files changed, 133 insertions(+)
>>>
>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
>>> cb8c174020..f771fcf8ac 100644
>>> --- a/app/test-pmd/cmdline.c
>>> +++ b/app/test-pmd/cmdline.c
>>> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
>>> *parsed_result,
>>>
>>>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>>>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
>>> +
>>> +			"port config (port_id) txq (queue_id) phy_affinity
>> (value)\n"
>>> +			"    Set the physical affinity value "
>>> +			"on a specific Tx queue\n\n"
>>>  		);
>>>  	}
>>>
>>> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
>> cmd_show_port_flow_transfer_proxy = {
>>>  	}
>>>  };
>>>
>>> +/* *** configure port txq phy_affinity value *** */ struct
>>> +cmd_config_tx_phy_affinity {
>>> +	cmdline_fixed_string_t port;
>>> +	cmdline_fixed_string_t config;
>>> +	portid_t portid;
>>> +	cmdline_fixed_string_t txq;
>>> +	uint16_t qid;
>>> +	cmdline_fixed_string_t phy_affinity;
>>> +	uint8_t value;
>>> +};
>>> +
>>> +static void
>>> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
>>> +				  __rte_unused struct cmdline *cl,
>>> +				  __rte_unused void *data)
>>> +{
>>> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	struct rte_port *port;
>>> +	int ret;
>>> +
>>> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
>>> +		return;
>>> +
>>> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
>>> +		printf("Invalid port id\n");
>>> +		return;
>>> +	}
>>> +
>>> +	port = &ports[res->portid];
>>> +
>>> +	if (strcmp(res->txq, "txq")) {
>>> +		printf("Unknown parameter\n");
>>> +		return;
>>> +	}
>>> +	if (tx_queue_id_is_invalid(res->qid))
>>> +		return;
>>> +
>>> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
>>> +	if (ret != 0)
>>> +		return;
>>> +
>>> +	if (dev_info.nb_phy_ports == 0) {
>>> +		printf("Number of physical ports is 0 which is invalid for PHY
>> Affinity\n");
>>> +		return;
>>> +	}
>>> +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
>>> +	if (dev_info.nb_phy_ports < res->value) {
>>> +		printf("The PHY affinity value %u is Invalid, exceeds the "
>>> +		       "number of physical ports\n", res->value);
>>> +		return;
>>> +	}
>>> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
>>> +
>>> +	cmd_reconfig_device_queue(res->portid, 0, 1); }
>>> +
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 port, "port");
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 config, "config");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 portid, RTE_UINT16);
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 txq, "txq");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +			      qid, RTE_UINT16);
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 phy_affinity, "phy_affinity");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +			      value, RTE_UINT8);
>>> +
>>> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
>>> +	.f = cmd_config_tx_phy_affinity_parsed,
>>> +	.data = (void *)0,
>>> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
>>> +	.tokens = {
>>> +		(void *)&cmd_config_tx_phy_affinity_port,
>>> +		(void *)&cmd_config_tx_phy_affinity_config,
>>> +		(void *)&cmd_config_tx_phy_affinity_portid,
>>> +		(void *)&cmd_config_tx_phy_affinity_txq,
>>> +		(void *)&cmd_config_tx_phy_affinity_qid,
>>> +		(void *)&cmd_config_tx_phy_affinity_hwport,
>>> +		(void *)&cmd_config_tx_phy_affinity_value,
>>> +		NULL,
>>> +	},
>>> +};
>>> +
>>>  /*
>>>
>> ****************************************************************
>> ******
>>> ********** */
>>>
>>>  /* list of instructions */
>>> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>>>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>>>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>>>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
>>> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>>>  	NULL,
>>>  };
>>>
>>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
>>> acccb6b035..b83fb17cfa 100644
>>> --- a/app/test-pmd/config.c
>>> +++ b/app/test-pmd/config.c
>>> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>>>  		printf("unknown\n");
>>>  		break;
>>>  	}
>>> +	printf("Current number of physical ports: %u\n",
>>> +dev_info.nb_phy_ports);
>>>  }
>>>
>>>  void
>>> diff --git a/devtools/libabigail.abignore
>>> b/devtools/libabigail.abignore index 7a93de3ba1..ac7d3fb2da 100644
>>> --- a/devtools/libabigail.abignore
>>> +++ b/devtools/libabigail.abignore
>>> @@ -34,3 +34,8 @@
>>>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>  ; Temporary exceptions till next major ABI version ;
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> +
>>> +; Ignore fields inserted in padding hole of rte_eth_txconf
>>> +[suppress_type]
>>> +        name = rte_eth_txconf
>>> +        has_data_member_inserted_between =
>>> +{offset_of(tx_deferred_start), offset_of(offloads)}
>>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>>> b/doc/guides/rel_notes/release_23_03.rst
>>> index 73f5d94e14..e99bd2dcb6 100644
>>> --- a/doc/guides/rel_notes/release_23_03.rst
>>> +++ b/doc/guides/rel_notes/release_23_03.rst
>>> @@ -55,6 +55,10 @@ New Features
>>>       Also, make sure to start the actual text at the margin.
>>>       =======================================================
>>>
>>> +* **Added affinity for multiple physical ports connected to a single
>>> +DPDK port.**
>>> +
>>> +  * Added Tx affinity in queue setup to map a physical port.
>>> +
>>>  * **Updated AMD axgbe driver.**
>>>
>>>    * Added multi-process support.
>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> index 79a1fa9cb7..5c716f7679 100644
>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
>> on a specific Tx queue::
>>>
>>>  This command should be run when the port is stopped, or else it will fail.
>>>
>>> +config per queue Tx physical affinity
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Configure a per queue physical affinity value only on a specific Tx queue::
>>> +
>>> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
>>> +
>>> +* ``phy_affinity``: physical port to use for sending,
>>> +                    when multiple physical ports are connected to
>>> +                    a single DPDK port.
>>> +
>>> +This command should be run when the port is stopped, otherwise it fails.
>>> +
>>>  Config VXLAN Encap outer layers
>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> c129ca1eaf..2fd971b7b5 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>>>  				      less free descriptors than this value. */
>>>
>>>  	uint8_t tx_deferred_start; /**< Do not start queue with
>>> rte_eth_dev_start(). */
>>> +	/**
>>> +	 * Affinity with one of the multiple physical ports connected to the
>> DPDK port.
>>> +	 * Value 0 means no affinity and traffic could be routed to any
>> connected
>>> +	 * physical port.
>>> +	 * The first physical port is number 1 and so on.
>>> +	 * Number of physical ports is reported by nb_phy_ports in
>> rte_eth_dev_info.
>>> +	 */
>>> +	uint8_t tx_phy_affinity;
>>>  	/**
>>>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
>> flags.
>>>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa @@
>>> -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>>>  	/** Device redirection table size, the total number of entries. */
>>>  	uint16_t reta_size;
>>>  	uint8_t hash_key_size; /**< Hash key size in bytes */
>>> +	/** Number of physical ports connected with DPDK port. */
>>> +	uint8_t nb_phy_ports;
>>>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>>>  	uint64_t flow_type_rss_offloads;
>>>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
>>> */
> 


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-09 19:44  0%       ` Ferruh Yigit
  2023-02-10 14:06  0%         ` Jiawei(Jonny) Wang
@ 2023-02-14  9:38  0%         ` Jiawei(Jonny) Wang
  2023-02-14 10:01  0%           ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-14  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

Hi,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 10, 2023 3:45 AM
> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> > When multiple physical ports are connected to a single DPDK port,
> > (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
> > know which physical port is used for Rx and Tx.
> >
> 
> I assume "kernel bonding" is out of context, but this patch concerns DPDK
> bonding, failsafe or softnic. (I will refer them as virtual bonding
> device.)
> 
> To use specific queues of the virtual bonding device may interfere with the
> logic of these devices, like bonding modes or RSS of the underlying devices. I
> can see feature focuses on a very specific use case, but not sure if all possible
> side effects taken into consideration.
> 
> 
> And although the feature is only relavent to virtual bondiong device, core
> ethdev structures are updated for this. Most use cases won't need these, so is
> there a way to reduce the scope of the changes to virtual bonding devices?
> 
> 
> There are a few very core ethdev APIs, like:
> rte_eth_dev_configure()
> rte_eth_tx_queue_setup()
> rte_eth_rx_queue_setup()
> rte_eth_dev_start()
> rte_eth_dev_info_get()
> 
> Almost every user of ehtdev uses these APIs, since these are so fundemental I
> am for being a little more conservative on these APIs.
> 
> Every eccentric features are targetting these APIs first because they are
> common and extending them gives an easy solution, but in long run making
> these APIs more complex, harder to maintain and harder for PMDs to support
> them correctly. So I am for not updating them unless it is a generic use case.
> 
> 
> Also as we talked about PMDs supporting them, I assume your coming PMD
> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
> What will happen for other NICs? Will they silently ignore the config option
> from user? So this is a problem for the DPDK application portabiltiy.
> 
> 
> 
> As far as I understand target is application controlling which sub-device is used
> under the virtual bonding device, can you pleaes give more information why
> this is required, perhaps it can help to provide a better/different solution.
> Like adding the ability to use both bonding device and sub-device for data path,
> this way application can use whichever it wants. (this is just first solution I
> come with, I am not suggesting as replacement solution, but if you can describe
> the problem more I am sure other people can come with better solutions.)
> 
> And isn't this against the applicatio transparent to underneath device being
> bonding device or actual device?
> 
> 

OK, I will send the new version with separate functions in ethdev layer, 
to support the Map a Tx queue to port and get the number of ports.
And these functions work with device ops callback, other NICs will reported
The unsupported the ops callback is NULL.

> > This patch maps a DPDK Tx queue with a physical port, by adding
> > tx_phy_affinity setting in Tx queue.
> > The affinity number is the physical port ID where packets will be
> > sent.
> > Value 0 means no affinity and traffic could be routed to any connected
> > physical ports, this is the default current behavior.
> >
> > The number of physical ports is reported with rte_eth_dev_info_get().
> >
> > The new tx_phy_affinity field is added into the padding hole of
> > rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> > An ABI check rule needs to be added to avoid false warning.
> >
> > Add the testpmd command line:
> > testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >
> > For example, there're two physical ports connected to a single DPDK
> > port (port id 0), and phy_affinity 1 stood for the first physical port
> > and phy_affinity 2 stood for the second physical port.
> > Use the below commands to config tx phy affinity for per Tx Queue:
> >         port config 0 txq 0 phy_affinity 1
> >         port config 0 txq 1 phy_affinity 1
> >         port config 0 txq 2 phy_affinity 2
> >         port config 0 txq 3 phy_affinity 2
> >
> > These commands config the Tx Queue index 0 and Tx Queue index 1 with
> > phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
> > packets will be sent from the first physical port, and similar with
> > the second physical port if sending packets with Tx Queue 2 or Tx
> > Queue 3.
> >
> > Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> > ---
> >  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
> >  app/test-pmd/config.c                       |   1 +
> >  devtools/libabigail.abignore                |   5 +
> >  doc/guides/rel_notes/release_23_03.rst      |   4 +
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
> >  lib/ethdev/rte_ethdev.h                     |  10 ++
> >  6 files changed, 133 insertions(+)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > cb8c174020..f771fcf8ac 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
> > *parsed_result,
> >
> >  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
> >  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> > +
> > +			"port config (port_id) txq (queue_id) phy_affinity
> (value)\n"
> > +			"    Set the physical affinity value "
> > +			"on a specific Tx queue\n\n"
> >  		);
> >  	}
> >
> > @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
> cmd_show_port_flow_transfer_proxy = {
> >  	}
> >  };
> >
> > +/* *** configure port txq phy_affinity value *** */ struct
> > +cmd_config_tx_phy_affinity {
> > +	cmdline_fixed_string_t port;
> > +	cmdline_fixed_string_t config;
> > +	portid_t portid;
> > +	cmdline_fixed_string_t txq;
> > +	uint16_t qid;
> > +	cmdline_fixed_string_t phy_affinity;
> > +	uint8_t value;
> > +};
> > +
> > +static void
> > +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> > +				  __rte_unused struct cmdline *cl,
> > +				  __rte_unused void *data)
> > +{
> > +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_port *port;
> > +	int ret;
> > +
> > +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> > +		return;
> > +
> > +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> > +		printf("Invalid port id\n");
> > +		return;
> > +	}
> > +
> > +	port = &ports[res->portid];
> > +
> > +	if (strcmp(res->txq, "txq")) {
> > +		printf("Unknown parameter\n");
> > +		return;
> > +	}
> > +	if (tx_queue_id_is_invalid(res->qid))
> > +		return;
> > +
> > +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> > +	if (ret != 0)
> > +		return;
> > +
> > +	if (dev_info.nb_phy_ports == 0) {
> > +		printf("Number of physical ports is 0 which is invalid for PHY
> Affinity\n");
> > +		return;
> > +	}
> > +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
> > +	if (dev_info.nb_phy_ports < res->value) {
> > +		printf("The PHY affinity value %u is Invalid, exceeds the "
> > +		       "number of physical ports\n", res->value);
> > +		return;
> > +	}
> > +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> > +
> > +	cmd_reconfig_device_queue(res->portid, 0, 1); }
> > +
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 port, "port");
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 config, "config");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 portid, RTE_UINT16);
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 txq, "txq");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +			      qid, RTE_UINT16);
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 phy_affinity, "phy_affinity");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +			      value, RTE_UINT8);
> > +
> > +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> > +	.f = cmd_config_tx_phy_affinity_parsed,
> > +	.data = (void *)0,
> > +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
> > +	.tokens = {
> > +		(void *)&cmd_config_tx_phy_affinity_port,
> > +		(void *)&cmd_config_tx_phy_affinity_config,
> > +		(void *)&cmd_config_tx_phy_affinity_portid,
> > +		(void *)&cmd_config_tx_phy_affinity_txq,
> > +		(void *)&cmd_config_tx_phy_affinity_qid,
> > +		(void *)&cmd_config_tx_phy_affinity_hwport,
> > +		(void *)&cmd_config_tx_phy_affinity_value,
> > +		NULL,
> > +	},
> > +};
> > +
> >  /*
> >
> ****************************************************************
> ******
> > ********** */
> >
> >  /* list of instructions */
> > @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
> >  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
> >  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
> >  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> > +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
> >  	NULL,
> >  };
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> > acccb6b035..b83fb17cfa 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
> >  		printf("unknown\n");
> >  		break;
> >  	}
> > +	printf("Current number of physical ports: %u\n",
> > +dev_info.nb_phy_ports);
> >  }
> >
> >  void
> > diff --git a/devtools/libabigail.abignore
> > b/devtools/libabigail.abignore index 7a93de3ba1..ac7d3fb2da 100644
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -34,3 +34,8 @@
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >  ; Temporary exceptions till next major ABI version ;
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > +
> > +; Ignore fields inserted in padding hole of rte_eth_txconf
> > +[suppress_type]
> > +        name = rte_eth_txconf
> > +        has_data_member_inserted_between =
> > +{offset_of(tx_deferred_start), offset_of(offloads)}
> > diff --git a/doc/guides/rel_notes/release_23_03.rst
> > b/doc/guides/rel_notes/release_23_03.rst
> > index 73f5d94e14..e99bd2dcb6 100644
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > @@ -55,6 +55,10 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =======================================================
> >
> > +* **Added affinity for multiple physical ports connected to a single
> > +DPDK port.**
> > +
> > +  * Added Tx affinity in queue setup to map a physical port.
> > +
> >  * **Updated AMD axgbe driver.**
> >
> >    * Added multi-process support.
> > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > index 79a1fa9cb7..5c716f7679 100644
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
> on a specific Tx queue::
> >
> >  This command should be run when the port is stopped, or else it will fail.
> >
> > +config per queue Tx physical affinity
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Configure a per queue physical affinity value only on a specific Tx queue::
> > +
> > +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> > +
> > +* ``phy_affinity``: physical port to use for sending,
> > +                    when multiple physical ports are connected to
> > +                    a single DPDK port.
> > +
> > +This command should be run when the port is stopped, otherwise it fails.
> > +
> >  Config VXLAN Encap outer layers
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c129ca1eaf..2fd971b7b5 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
> >  				      less free descriptors than this value. */
> >
> >  	uint8_t tx_deferred_start; /**< Do not start queue with
> > rte_eth_dev_start(). */
> > +	/**
> > +	 * Affinity with one of the multiple physical ports connected to the
> DPDK port.
> > +	 * Value 0 means no affinity and traffic could be routed to any
> connected
> > +	 * physical port.
> > +	 * The first physical port is number 1 and so on.
> > +	 * Number of physical ports is reported by nb_phy_ports in
> rte_eth_dev_info.
> > +	 */
> > +	uint8_t tx_phy_affinity;
> >  	/**
> >  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
> flags.
> >  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa @@
> > -1744,6 +1752,8 @@ struct rte_eth_dev_info {
> >  	/** Device redirection table size, the total number of entries. */
> >  	uint16_t reta_size;
> >  	uint8_t hash_key_size; /**< Hash key size in bytes */
> > +	/** Number of physical ports connected with DPDK port. */
> > +	uint8_t nb_phy_ports;
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> >  	uint64_t flow_type_rss_offloads;
> >  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
> > */


^ permalink raw reply	[relevance 0%]

* [PATCH v5 21/22] hash: move rte_hash_set_alg out header
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
@ 2023-02-14  2:19  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14  2:19 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v5 00/22] Replace us of static logtypes
    2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
@ 2023-02-14  2:18  3% ` Stephen Hemminger
  2023-02-14  2:19  3%   ` [PATCH v5 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-14  2:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v5 - fix use of LOGTYPE PORT and POWER in examples

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 375 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  2023-02-13 15:28  0%                           ` Ben Magistro
@ 2023-02-13 23:18  0%                           ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-13 23:18 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Mon, Feb 13, 2023 at 05:04:49AM +0000, Honnappa Nagarahalli wrote:
> Hi Tyler,
> 	Few more comments inline. Let us continue to make progress, I will add this topic for Techboard discussion for 22nd Feb.
> 
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Friday, February 10, 2023 2:30 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> > dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> > jerinj@marvell.com; konstantin.ananyev@huawei.com;
> > ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> > Subject: Re: [PATCH] eal: introduce atomics abstraction
> > 
> > On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > > <snip>
> > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > > we could
> > > > > > > > have a
> > > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > > to support
> > > > > > > > only
> > > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > > APIs and
> > > > > > > > when we move
> > > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > > rid of the
> > > > > > > > file in DPDK
> > > > > > > > > > repo.
> > > > > > > > > >
> > > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > > or
> > > > > > > > introduce names
> > > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > > >
> > > > > > > > > > references:
> > > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > > >  * GNU libc manual
> > > > > > > > > >
> > > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > > ved-
> > > > > > > > > > Names.html
> > > > > > > > > >
> > > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > > namespaces
> > > > > > > > introduced
> > > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > > reasons in
> > > > > > > > the GNU libc
> > > > > > > > > Wouldn't this apply only after the particular APIs were
> > introduced?
> > > > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > > > >
> > > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > > this as permissive guidance against.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > manual that explain the justification for these
> > > > > > > > > > reservations and if
> > > > > > > > if we think
> > > > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > > > >
> > > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > > the POSIX
> > > > > > > > standard
> > > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > > been
> > > > > > > > problematic for
> > > > > > > > > > portability.
> > > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > > with a
> > > > > > > > compiler that does not support stdatomics.
> > > > > > > >
> > > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > > at the same time.
> > > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > > linux distribution.
> > > > > > > >
> > > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > > application has to interoperate with a different kit or
> > > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > > with their own version of a shim to hide our faked up stdatomics.
> > > > > > > >
> > > > > > >
> > > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > > compatible with a
> > > > > > separate application build environment, they both have to
> > > > > > implement atomics the same way, i.e. agree on the ABI for atomics.
> > > > > > >
> > > > > > > Summing up, this leaves us with only two realistic options:
> > > > > > >
> > > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > > build
> > > > > > environment to support C11 stdatomics.
> > > > > > > 2. Provide our own DPDK atomics library.
> > > > > > >
> > > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > > realistic!)
> > > > > > >
> > > > > > > I strongly want atomics to be available for use across inline
> > > > > > > and compiled
> > > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > > and inline functions to perform atomic transactions on the same
> > atomic variable.
> > > > > >
> > > > > > i consider it a mandatory requirement. i don't see practically
> > > > > > how we could withdraw existing use and even if we had clean way
> > > > > > i don't see why we would want to. so this item is defintely
> > > > > > settled if you were
> > > > concerned.
> > > > > I think I agree here.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > > C11 (including
> > > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > > >
> > > > > > i think the issue of requiring a toolchain conformant to a
> > > > > > specific standard is a separate matter because any adoption of
> > > > > > C11 standard atomics is a potential abi break from the current use of
> > intrinsics.
> > > > > I am not sure why you are calling it as ABI break. Referring to
> > > > > [1], I just see
> > > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > > >
> > > > > [1]
> > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > > ic.h
> > > > > [2]
> > > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > > omic
> > > > > .h
> > > >
> > > > it's a potential abi break because atomic types are not the same
> > > > types as their corresponding integer types etc.. (or at least are
> > > > not guaranteed to be by all implementations of c as an abstract language).
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     6.2.5 (27)
> > > >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> > > >     qualifier designates an atomic type. The size, representation, and
> > alignment
> > > >     of an atomic type need not be the same as those of the corresponding
> > > >     unqualified type.
> > > >
> > > >     7.17.6 (3)
> > > >     NOTE The representation of atomic integer types need not have the
> > same size
> > > >     as their corresponding regular types. They should have the same
> > > > size whenever
> > > >     possible, as it eases effort required to port existing code.
> > > >
> > > > i use the term `potential abi break' with intent because for me to
> > > > assert in absolute terms i would have to evaluate the implementation
> > > > of every current and potential future compilers atomic vs non-atomic
> > > > types. this as i'm sure you understand is not practical, it would
> > > > also defeat the purpose of moving to a standard. therefore i rely on
> > > > the specification prescribed by the standard not the detail of a specific
> > implementation.
> > > Can we say that the platforms 'supported' by DPDK today do not have this
> > problem? Any future platforms that will come to DPDK have to evaluate this.
> > 
> > sadly i don't think we can. i believe in an earlier post i linked a bug filed on
> > gcc that shows that clang / gcc were producing different layout than the
> > equivalent non-atomic type.
> I looked at that bug again, it is to do with structure.

just to be clear, you're saying you aren't concerned because we don't
have in our public api struct objects to which we apply atomic
operations?

if that guarantee is absolute and stays true in our public api then i am
satisfied and we can drop the issue.

hypothetically if we make this assumption are you proposing that all
platform/toolchain combinations that support std=c11 and optional
stdatomic should adopt them as default on?

there are other implications to doing this, let's dig into the details
at the next technical board meeting.

> 
> > 
> > >
> > > >
> > > >
> > > > > > the abstraction (whatever namespace it resides) allows the
> > > > > > existing toolchain/platform combinations to maintain
> > > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > > How about using the intrinsics (__atomic_xxx) name space for
> > abstraction?
> > > > This covers the GCC and Clang compilers.
> > 
> > i haven't investigated fully but there are usages of these intrinsics that
> > indicate there may be undesirable difference between clang and gcc versions.
> > the hint is there seems to be conditionally compiled code under __clang__
> > when using some __atomic's.
> I sent an RFC to address this [1]. I think the size specific intrinsics are not necessary.
> 
> [1] http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/

yep, looks good to me. i acked the change.

thank you.

> 
> > 
> > for the purpose of this discussion clang just tries to look like gcc so i don't
> > regard them as being different compilers for the purpose of this discussion.
> > 
> > > >
> > > > the namespace starting with `__` is also reserved for the implementation.
> > > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > > builtin functions starting with __ to explicitly avoid collision
> > > > with the application namespace.
> > 
> > > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > > not just '__')
> > 
> > i don't understand the confusion __atomic is within the __ namespace that is
> > reserved.
> What I mean is, we are not formulating a policy/rule to allow for any name space that starts with '__'.

understood, but we appear to be trying to formulate a policy allowing a name
within that space which is reserved by the standard for and claimed by gcc.

anyway, let's discuss further at the meeting.

> 
> > 
> > let me ask this another way, what benefit do you see to trying to overlap with
> > the standard namespace? the only benefit i can see is that at some point in
> > the future it avoids having to perform a mechanical change to eventually
> > retire the abstraction once all platform/toolchains support standard atomics.
> > i.e. basically s/rte_atomic/atomic/g
> > 
> > is there another benefit i'm missing?
> The abstraction you have proposed solves the problem for the long term. The proposed abstraction stops us from thinking about moving to stdatomics.

i think this is where you've got me a bit confused. i'd like to
understand how it stops us thinking about moving to stdatomics.

> IMO, the problem is short term. Using the __atomic_ name space does not have any practical issues with the platforms DPDK supports (unless msvc has a problem with this, more questions below).

oh, sorry for not answering this previously. msvc (and as it happens
clang) both use __c11_atomic_xxx as a namespace. i'm only aware of gcc
and potentially compilers that try to look like gcc using __atomic_xxxx.

so if you're asking if that selection would interfere with msvc, it
wouldn't. i'm only concerned with mingling in a namespace that gcc has
claimed.

> 
> > 
> > >
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     7.1.3 (1)
> > > >     All identifiers that begin with an underscore and either an uppercase
> > > >     letter or another underscore are always reserved for any use.
> > > >
> > > >     ...
> > > >
> > > > > If there is another platform that uses the same name space for
> > > > > something
> > > > else, I think DPDK should not be supporting that platform.
> > > >
> > > > that's effectively a statement excluding windows platform and all
> > > > non-gcc compilers from ever supporting dpdk.
> > > Apologies, I did not understand your comment on windows platform. Do
> > you mean to say a compiler for windows platform uses '__atomic_xxx' name
> > space to provide some other functionality (and hence it would get excluded)?
> > 
> > i mean dpdk can never fully be supported without msvc except for statically
> > linked builds which are niche and limit it too severely for many consumers to
> > practically use dpdk. there are also many application developers who would
> > like to integrate dpdk but can't and telling them their only choice is to re-port
> > their entire application to clang isn't feasible.
> > 
> > i can see no technical reason why we should be excluding a major compiler in
> > broad use if it is capable of building dpdk. msvc arguably has some of the
> > most sophisticated security features in the industry and the use of those
> > features is mandated by many of the customers who might deploy dpdk
> > applications on windows.
> I did not mean DPDK should not support msvc (may be my sentence below was misunderstood).
> Does msvc provide '__atomic_xxx' intrinsics?

msvc provides stdatomic (behind stdatomic there are intrinsics)

> 
> > 
> > > Clang supports these intrinsics. I am not sure about the merit of supporting
> > other non-gcc compilers. May be a topic Techboard discussion.
> > >
> > > >
> > > > > What problems do you see?
> > > >
> > > > i'm fairly certain at least one other compiler uses the __atomic
> > > > namespace but
> > > Do you mean __atomic namespace is used for some other purpose?
> > >
> > > > it would take me time to check, the most notable potential issue
> > > > that comes to mind is if such an intrinsic with the same name is
> > > > provided in a different implementation and has either regressive
> > > > code generation or different semantics it would be bad because it is
> > > > intrinsic you can't just hack around it with #undef __atomic to shim in a
> > semantically correct version.
> > > I do not think we should worry about regressive code generation problem. It
> > should be fixed by that compiler.
> > > Different semantics is something we need to worry about. It would be good
> > to find out more about a compiler that does this.
> > 
> > again, this is about portability it's about potential not that we can find an
> > example.
> > 
> > >
> > > >
> > > > how about this, is there another possible namespace you might
> > > > suggest that conforms or doesn't conflict with the the rules defined
> > > > in ISO/IEC 9899:2011
> > > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > > related to namespaces.
> > > >
> > > > keep in mind the point of moving to a standard is to achieve
> > > > portability so if we do things that will regress us back to being
> > > > dependent on an implementation we haven't succeeded. that's all i'm
> > trying to guarantee here.
> > > Agree. We are trying to solve a problem that is temporary. I am trying to
> > keep the problem scope narrow which might help us push to adopt the
> > standard sooner.
> > 
> > i do wish we could just target the standard but unless we are willing to draw a
> > line and say no more non std=c11 and also we potentially break the abi we
> > are talking years. i don't think it is reasonable to block progress for years, so
> > i'm offering a transitional path. it's an evolution over time that we have to
> > manage.
> Apologies if I am sounding like I am blocking progress. Rest assured, we will find a way. It is just about which solution we are going to pick.

no problems, i really appreciate any help.

> Also, is there are any information on how long before we move to C11?

we need to clear all long term compatibility promises for gcc/linux
platforms that don't support -std=c11 and implement stdatomic option. the
last discussion was that it was years i believe.

Bruce has a patch series and another thread going talking about moving
to -std=c99 which is good but of course doesn't get us to std=c99.

> 
> > 
> > >
> > > >
> > > > i feel like we are really close on this discussion, if we can just
> > > > iron this issue out we can probably get going on the actual changes.
> > > >
> > > > thanks for the consideration.
> > > >
> > > > >
> > > > > >
> > > > > > once in place it provides an opportunity to introduce new
> > > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > > subject to community discussion on how/if/when.
> > > > > >
> > > > > > it would be good to get more participants into the discussion so
> > > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > > >
> > > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > > due to to disadvantages of trying to overlap with the standard
> > > > > > namespace while not providing a compatible api/abi and because
> > > > > > it provides clear disambiguation of that difference in semantics
> > > > > > and compatibility with
> > > > the standard api.
> > > > > >
> > > > > > so far i've noted the following
> > > > > >
> > > > > > * we will not provide the non-explicit apis.
> > > > > +1
> > > > >
> > > > > > * we will make no attempt to support operate on struct/union atomics
> > > > > >   with our apis.
> > > > > +1
> > > > >
> > > > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > > > >   - reference the standard api documentation.
> > > > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > > > >
> > > > > > my vote is to remove 'potentially' from the last point above for
> > > > > > reasons previously discussed in postings to the mail thread.
> > > > > >
> > > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > > non-explicit apis for viewing.
> > > > > >
> > > > > > ty

^ permalink raw reply	[relevance 0%]

* [PATCH v4 18/19] hash: move rte_hash_set_alg out header
  2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
@ 2023-02-13 19:55  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-13 19:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 00/19] Replace use of static logtypes
  @ 2023-02-13 19:55  3% ` Stephen Hemminger
  2023-02-13 19:55  3%   ` [PATCH v4 18/19] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-13 19:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (19):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 70 files changed, 363 insertions(+), 158 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
@ 2023-02-13 15:28  0%                           ` Ben Magistro
  2023-02-13 23:18  0%                           ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Ben Magistro @ 2023-02-13 15:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Tyler Retzlaff, Morten Brørup, thomas, dev,
	bruce.richardson, david.marchand, jerinj, konstantin.ananyev,
	ferruh.yigit, nd, techboard

[-- Attachment #1: Type: text/plain, Size: 17850 bytes --]

There is a thread discussing a change to the standard [1] but I have not
seen anything explicit yet about moving to C11.  I am personally in favor
of making the jump to C11 now as part of the 23.x branch and provided my
thoughts in the linked thread (what other projects using DPDK have as
minimum compiler requirements, CentOS 7 EOL dates).

Is the long term plan to backport this change set to the existing LTS
release or is this meant to be something introduced for use in 23.x and
going forward?  I think I was (probably naively) assuming this would be a
new feature in the 23.x going forward only.

[1] http://mails.dpdk.org/archives/dev/2023-February/262188.html

On Mon, Feb 13, 2023 at 12:05 AM Honnappa Nagarahalli <
Honnappa.Nagarahalli@arm.com> wrote:

> Hi Tyler,
>         Few more comments inline. Let us continue to make progress, I will
> add this topic for Techboard discussion for 22nd Feb.
>
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Friday, February 10, 2023 2:30 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> > dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> > jerinj@marvell.com; konstantin.ananyev@huawei.com;
> > ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> > Subject: Re: [PATCH] eal: introduce atomics abstraction
> >
> > On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > > <snip>
> > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > > we could
> > > > > > > > have a
> > > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > > to support
> > > > > > > > only
> > > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > > APIs and
> > > > > > > > when we move
> > > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > > rid of the
> > > > > > > > file in DPDK
> > > > > > > > > > repo.
> > > > > > > > > >
> > > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > > or
> > > > > > > > introduce names
> > > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > > >
> > > > > > > > > > references:
> > > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > > >  * GNU libc manual
> > > > > > > > > >
> > > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > > ved-
> > > > > > > > > > Names.html
> > > > > > > > > >
> > > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > > namespaces
> > > > > > > > introduced
> > > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > > reasons in
> > > > > > > > the GNU libc
> > > > > > > > > Wouldn't this apply only after the particular APIs were
> > introduced?
> > > > > > > > i.e. it should not apply if the compiler does not support
> stdatomics.
> > > > > > > >
> > > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > > this as permissive guidance against.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > manual that explain the justification for these
> > > > > > > > > > reservations and if
> > > > > > > > if we think
> > > > > > > > > > about ODR and ABI compatibility we can conceive of
> others.
> > > > > > > > > >
> > > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > > the POSIX
> > > > > > > > standard
> > > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > > been
> > > > > > > > problematic for
> > > > > > > > > > portability.
> > > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > > with a
> > > > > > > > compiler that does not support stdatomics.
> > > > > > > >
> > > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > > at the same time.
> > > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > > linux distribution.
> > > > > > > >
> > > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > > application has to interoperate with a different kit or
> > > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > > with their own version of a shim to hide our faked up
> stdatomics.
> > > > > > > >
> > > > > > >
> > > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > > compatible with a
> > > > > > separate application build environment, they both have to
> > > > > > implement atomics the same way, i.e. agree on the ABI for
> atomics.
> > > > > > >
> > > > > > > Summing up, this leaves us with only two realistic options:
> > > > > > >
> > > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > > build
> > > > > > environment to support C11 stdatomics.
> > > > > > > 2. Provide our own DPDK atomics library.
> > > > > > >
> > > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > > realistic!)
> > > > > > >
> > > > > > > I strongly want atomics to be available for use across inline
> > > > > > > and compiled
> > > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > > and inline functions to perform atomic transactions on the same
> > atomic variable.
> > > > > >
> > > > > > i consider it a mandatory requirement. i don't see practically
> > > > > > how we could withdraw existing use and even if we had clean way
> > > > > > i don't see why we would want to. so this item is defintely
> > > > > > settled if you were
> > > > concerned.
> > > > > I think I agree here.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > > C11 (including
> > > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > > >
> > > > > > i think the issue of requiring a toolchain conformant to a
> > > > > > specific standard is a separate matter because any adoption of
> > > > > > C11 standard atomics is a potential abi break from the current
> use of
> > intrinsics.
> > > > > I am not sure why you are calling it as ABI break. Referring to
> > > > > [1], I just see
> > > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > > >
> > > > > [1]
> > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > > ic.h
> > > > > [2]
> > > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > > omic
> > > > > .h
> > > >
> > > > it's a potential abi break because atomic types are not the same
> > > > types as their corresponding integer types etc.. (or at least are
> > > > not guaranteed to be by all implementations of c as an abstract
> language).
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     6.2.5 (27)
> > > >     Further, there is the _Atomic qualifier. The presence of the
> _Atomic
> > > >     qualifier designates an atomic type. The size, representation,
> and
> > alignment
> > > >     of an atomic type need not be the same as those of the
> corresponding
> > > >     unqualified type.
> > > >
> > > >     7.17.6 (3)
> > > >     NOTE The representation of atomic integer types need not have the
> > same size
> > > >     as their corresponding regular types. They should have the same
> > > > size whenever
> > > >     possible, as it eases effort required to port existing code.
> > > >
> > > > i use the term `potential abi break' with intent because for me to
> > > > assert in absolute terms i would have to evaluate the implementation
> > > > of every current and potential future compilers atomic vs non-atomic
> > > > types. this as i'm sure you understand is not practical, it would
> > > > also defeat the purpose of moving to a standard. therefore i rely on
> > > > the specification prescribed by the standard not the detail of a
> specific
> > implementation.
> > > Can we say that the platforms 'supported' by DPDK today do not have
> this
> > problem? Any future platforms that will come to DPDK have to evaluate
> this.
> >
> > sadly i don't think we can. i believe in an earlier post i linked a bug
> filed on
> > gcc that shows that clang / gcc were producing different layout than the
> > equivalent non-atomic type.
> I looked at that bug again, it is to do with structure.
>
> >
> > >
> > > >
> > > >
> > > > > > the abstraction (whatever namespace it resides) allows the
> > > > > > existing toolchain/platform combinations to maintain
> > > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > > How about using the intrinsics (__atomic_xxx) name space for
> > abstraction?
> > > > This covers the GCC and Clang compilers.
> >
> > i haven't investigated fully but there are usages of these intrinsics
> that
> > indicate there may be undesirable difference between clang and gcc
> versions.
> > the hint is there seems to be conditionally compiled code under __clang__
> > when using some __atomic's.
> I sent an RFC to address this [1]. I think the size specific intrinsics
> are not necessary.
>
> [1]
> http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/
>
> >
> > for the purpose of this discussion clang just tries to look like gcc so
> i don't
> > regard them as being different compilers for the purpose of this
> discussion.
> >
> > > >
> > > > the namespace starting with `__` is also reserved for the
> implementation.
> > > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > > builtin functions starting with __ to explicitly avoid collision
> > > > with the application namespace.
> >
> > > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > > not just '__')
> >
> > i don't understand the confusion __atomic is within the __ namespace
> that is
> > reserved.
> What I mean is, we are not formulating a policy/rule to allow for any name
> space that starts with '__'.
>
> >
> > let me ask this another way, what benefit do you see to trying to
> overlap with
> > the standard namespace? the only benefit i can see is that at some point
> in
> > the future it avoids having to perform a mechanical change to eventually
> > retire the abstraction once all platform/toolchains support standard
> atomics.
> > i.e. basically s/rte_atomic/atomic/g
> >
> > is there another benefit i'm missing?
> The abstraction you have proposed solves the problem for the long term.
> The proposed abstraction stops us from thinking about moving to stdatomics.
> IMO, the problem is short term. Using the __atomic_ name space does not
> have any practical issues with the platforms DPDK supports (unless msvc has
> a problem with this, more questions below).
>
> >
> > >
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     7.1.3 (1)
> > > >     All identifiers that begin with an underscore and either an
> uppercase
> > > >     letter or another underscore are always reserved for any use.
> > > >
> > > >     ...
> > > >
> > > > > If there is another platform that uses the same name space for
> > > > > something
> > > > else, I think DPDK should not be supporting that platform.
> > > >
> > > > that's effectively a statement excluding windows platform and all
> > > > non-gcc compilers from ever supporting dpdk.
> > > Apologies, I did not understand your comment on windows platform. Do
> > you mean to say a compiler for windows platform uses '__atomic_xxx' name
> > space to provide some other functionality (and hence it would get
> excluded)?
> >
> > i mean dpdk can never fully be supported without msvc except for
> statically
> > linked builds which are niche and limit it too severely for many
> consumers to
> > practically use dpdk. there are also many application developers who
> would
> > like to integrate dpdk but can't and telling them their only choice is
> to re-port
> > their entire application to clang isn't feasible.
> >
> > i can see no technical reason why we should be excluding a major
> compiler in
> > broad use if it is capable of building dpdk. msvc arguably has some of
> the
> > most sophisticated security features in the industry and the use of those
> > features is mandated by many of the customers who might deploy dpdk
> > applications on windows.
> I did not mean DPDK should not support msvc (may be my sentence below was
> misunderstood).
> Does msvc provide '__atomic_xxx' intrinsics?
>
> >
> > > Clang supports these intrinsics. I am not sure about the merit of
> supporting
> > other non-gcc compilers. May be a topic Techboard discussion.
> > >
> > > >
> > > > > What problems do you see?
> > > >
> > > > i'm fairly certain at least one other compiler uses the __atomic
> > > > namespace but
> > > Do you mean __atomic namespace is used for some other purpose?
> > >
> > > > it would take me time to check, the most notable potential issue
> > > > that comes to mind is if such an intrinsic with the same name is
> > > > provided in a different implementation and has either regressive
> > > > code generation or different semantics it would be bad because it is
> > > > intrinsic you can't just hack around it with #undef __atomic to shim
> in a
> > semantically correct version.
> > > I do not think we should worry about regressive code generation
> problem. It
> > should be fixed by that compiler.
> > > Different semantics is something we need to worry about. It would be
> good
> > to find out more about a compiler that does this.
> >
> > again, this is about portability it's about potential not that we can
> find an
> > example.
> >
> > >
> > > >
> > > > how about this, is there another possible namespace you might
> > > > suggest that conforms or doesn't conflict with the the rules defined
> > > > in ISO/IEC 9899:2011
> > > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > > related to namespaces.
> > > >
> > > > keep in mind the point of moving to a standard is to achieve
> > > > portability so if we do things that will regress us back to being
> > > > dependent on an implementation we haven't succeeded. that's all i'm
> > trying to guarantee here.
> > > Agree. We are trying to solve a problem that is temporary. I am trying
> to
> > keep the problem scope narrow which might help us push to adopt the
> > standard sooner.
> >
> > i do wish we could just target the standard but unless we are willing to
> draw a
> > line and say no more non std=c11 and also we potentially break the abi we
> > are talking years. i don't think it is reasonable to block progress for
> years, so
> > i'm offering a transitional path. it's an evolution over time that we
> have to
> > manage.
> Apologies if I am sounding like I am blocking progress. Rest assured, we
> will find a way. It is just about which solution we are going to pick.
> Also, is there are any information on how long before we move to C11?
>
> >
> > >
> > > >
> > > > i feel like we are really close on this discussion, if we can just
> > > > iron this issue out we can probably get going on the actual changes.
> > > >
> > > > thanks for the consideration.
> > > >
> > > > >
> > > > > >
> > > > > > once in place it provides an opportunity to introduce new
> > > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > > subject to community discussion on how/if/when.
> > > > > >
> > > > > > it would be good to get more participants into the discussion so
> > > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > > >
> > > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > > due to to disadvantages of trying to overlap with the standard
> > > > > > namespace while not providing a compatible api/abi and because
> > > > > > it provides clear disambiguation of that difference in semantics
> > > > > > and compatibility with
> > > > the standard api.
> > > > > >
> > > > > > so far i've noted the following
> > > > > >
> > > > > > * we will not provide the non-explicit apis.
> > > > > +1
> > > > >
> > > > > > * we will make no attempt to support operate on struct/union
> atomics
> > > > > >   with our apis.
> > > > > +1
> > > > >
> > > > > > * we will mirror the standard api potentially in the rte_
> namespace to
> > > > > >   - reference the standard api documentation.
> > > > > >   - assume compatible semantics (sans exceptions from first 2
> points).
> > > > > >
> > > > > > my vote is to remove 'potentially' from the last point above for
> > > > > > reasons previously discussed in postings to the mail thread.
> > > > > >
> > > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > > non-explicit apis for viewing.
> > > > > >
> > > > > > ty
>

[-- Attachment #2: Type: text/html, Size: 24434 bytes --]

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-10 20:30  3%                       ` Tyler Retzlaff
@ 2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  2023-02-13 15:28  0%                           ` Ben Magistro
  2023-02-13 23:18  0%                           ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2023-02-13  5:04 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard, nd

Hi Tyler,
	Few more comments inline. Let us continue to make progress, I will add this topic for Techboard discussion for 22nd Feb.

> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Friday, February 10, 2023 2:30 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> jerinj@marvell.com; konstantin.ananyev@huawei.com;
> ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> Subject: Re: [PATCH] eal: introduce atomics abstraction
> 
> On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > <snip>
> > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > we could
> > > > > > > have a
> > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > to support
> > > > > > > only
> > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > APIs and
> > > > > > > when we move
> > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > rid of the
> > > > > > > file in DPDK
> > > > > > > > > repo.
> > > > > > > > >
> > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > or
> > > > > > > introduce names
> > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > >
> > > > > > > > > references:
> > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > >  * GNU libc manual
> > > > > > > > >
> > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > ved-
> > > > > > > > > Names.html
> > > > > > > > >
> > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > namespaces
> > > > > > > introduced
> > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > reasons in
> > > > > > > the GNU libc
> > > > > > > > Wouldn't this apply only after the particular APIs were
> introduced?
> > > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > > >
> > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > this as permissive guidance against.
> > > > > > >
> > > > > > > >
> > > > > > > > > manual that explain the justification for these
> > > > > > > > > reservations and if
> > > > > > > if we think
> > > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > > >
> > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > the POSIX
> > > > > > > standard
> > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > been
> > > > > > > problematic for
> > > > > > > > > portability.
> > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > with a
> > > > > > > compiler that does not support stdatomics.
> > > > > > >
> > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > at the same time.
> > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > linux distribution.
> > > > > > >
> > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > application has to interoperate with a different kit or
> > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > with their own version of a shim to hide our faked up stdatomics.
> > > > > > >
> > > > > >
> > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > compatible with a
> > > > > separate application build environment, they both have to
> > > > > implement atomics the same way, i.e. agree on the ABI for atomics.
> > > > > >
> > > > > > Summing up, this leaves us with only two realistic options:
> > > > > >
> > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > build
> > > > > environment to support C11 stdatomics.
> > > > > > 2. Provide our own DPDK atomics library.
> > > > > >
> > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > realistic!)
> > > > > >
> > > > > > I strongly want atomics to be available for use across inline
> > > > > > and compiled
> > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > and inline functions to perform atomic transactions on the same
> atomic variable.
> > > > >
> > > > > i consider it a mandatory requirement. i don't see practically
> > > > > how we could withdraw existing use and even if we had clean way
> > > > > i don't see why we would want to. so this item is defintely
> > > > > settled if you were
> > > concerned.
> > > > I think I agree here.
> > > >
> > > > >
> > > > > >
> > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > C11 (including
> > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > >
> > > > > i think the issue of requiring a toolchain conformant to a
> > > > > specific standard is a separate matter because any adoption of
> > > > > C11 standard atomics is a potential abi break from the current use of
> intrinsics.
> > > > I am not sure why you are calling it as ABI break. Referring to
> > > > [1], I just see
> > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > >
> > > > [1]
> > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > ic.h
> > > > [2]
> > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > omic
> > > > .h
> > >
> > > it's a potential abi break because atomic types are not the same
> > > types as their corresponding integer types etc.. (or at least are
> > > not guaranteed to be by all implementations of c as an abstract language).
> > >
> > >     ISO/IEC 9899:2011
> > >
> > >     6.2.5 (27)
> > >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> > >     qualifier designates an atomic type. The size, representation, and
> alignment
> > >     of an atomic type need not be the same as those of the corresponding
> > >     unqualified type.
> > >
> > >     7.17.6 (3)
> > >     NOTE The representation of atomic integer types need not have the
> same size
> > >     as their corresponding regular types. They should have the same
> > > size whenever
> > >     possible, as it eases effort required to port existing code.
> > >
> > > i use the term `potential abi break' with intent because for me to
> > > assert in absolute terms i would have to evaluate the implementation
> > > of every current and potential future compilers atomic vs non-atomic
> > > types. this as i'm sure you understand is not practical, it would
> > > also defeat the purpose of moving to a standard. therefore i rely on
> > > the specification prescribed by the standard not the detail of a specific
> implementation.
> > Can we say that the platforms 'supported' by DPDK today do not have this
> problem? Any future platforms that will come to DPDK have to evaluate this.
> 
> sadly i don't think we can. i believe in an earlier post i linked a bug filed on
> gcc that shows that clang / gcc were producing different layout than the
> equivalent non-atomic type.
I looked at that bug again, it is to do with structure.

> 
> >
> > >
> > >
> > > > > the abstraction (whatever namespace it resides) allows the
> > > > > existing toolchain/platform combinations to maintain
> > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > How about using the intrinsics (__atomic_xxx) name space for
> abstraction?
> > > This covers the GCC and Clang compilers.
> 
> i haven't investigated fully but there are usages of these intrinsics that
> indicate there may be undesirable difference between clang and gcc versions.
> the hint is there seems to be conditionally compiled code under __clang__
> when using some __atomic's.
I sent an RFC to address this [1]. I think the size specific intrinsics are not necessary.

[1] http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/

> 
> for the purpose of this discussion clang just tries to look like gcc so i don't
> regard them as being different compilers for the purpose of this discussion.
> 
> > >
> > > the namespace starting with `__` is also reserved for the implementation.
> > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > builtin functions starting with __ to explicitly avoid collision
> > > with the application namespace.
> 
> > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > not just '__')
> 
> i don't understand the confusion __atomic is within the __ namespace that is
> reserved.
What I mean is, we are not formulating a policy/rule to allow for any name space that starts with '__'.

> 
> let me ask this another way, what benefit do you see to trying to overlap with
> the standard namespace? the only benefit i can see is that at some point in
> the future it avoids having to perform a mechanical change to eventually
> retire the abstraction once all platform/toolchains support standard atomics.
> i.e. basically s/rte_atomic/atomic/g
> 
> is there another benefit i'm missing?
The abstraction you have proposed solves the problem for the long term. The proposed abstraction stops us from thinking about moving to stdatomics.
IMO, the problem is short term. Using the __atomic_ name space does not have any practical issues with the platforms DPDK supports (unless msvc has a problem with this, more questions below).

> 
> >
> > >
> > >     ISO/IEC 9899:2011
> > >
> > >     7.1.3 (1)
> > >     All identifiers that begin with an underscore and either an uppercase
> > >     letter or another underscore are always reserved for any use.
> > >
> > >     ...
> > >
> > > > If there is another platform that uses the same name space for
> > > > something
> > > else, I think DPDK should not be supporting that platform.
> > >
> > > that's effectively a statement excluding windows platform and all
> > > non-gcc compilers from ever supporting dpdk.
> > Apologies, I did not understand your comment on windows platform. Do
> you mean to say a compiler for windows platform uses '__atomic_xxx' name
> space to provide some other functionality (and hence it would get excluded)?
> 
> i mean dpdk can never fully be supported without msvc except for statically
> linked builds which are niche and limit it too severely for many consumers to
> practically use dpdk. there are also many application developers who would
> like to integrate dpdk but can't and telling them their only choice is to re-port
> their entire application to clang isn't feasible.
> 
> i can see no technical reason why we should be excluding a major compiler in
> broad use if it is capable of building dpdk. msvc arguably has some of the
> most sophisticated security features in the industry and the use of those
> features is mandated by many of the customers who might deploy dpdk
> applications on windows.
I did not mean DPDK should not support msvc (may be my sentence below was misunderstood).
Does msvc provide '__atomic_xxx' intrinsics?

> 
> > Clang supports these intrinsics. I am not sure about the merit of supporting
> other non-gcc compilers. May be a topic Techboard discussion.
> >
> > >
> > > > What problems do you see?
> > >
> > > i'm fairly certain at least one other compiler uses the __atomic
> > > namespace but
> > Do you mean __atomic namespace is used for some other purpose?
> >
> > > it would take me time to check, the most notable potential issue
> > > that comes to mind is if such an intrinsic with the same name is
> > > provided in a different implementation and has either regressive
> > > code generation or different semantics it would be bad because it is
> > > intrinsic you can't just hack around it with #undef __atomic to shim in a
> semantically correct version.
> > I do not think we should worry about regressive code generation problem. It
> should be fixed by that compiler.
> > Different semantics is something we need to worry about. It would be good
> to find out more about a compiler that does this.
> 
> again, this is about portability it's about potential not that we can find an
> example.
> 
> >
> > >
> > > how about this, is there another possible namespace you might
> > > suggest that conforms or doesn't conflict with the the rules defined
> > > in ISO/IEC 9899:2011
> > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > related to namespaces.
> > >
> > > keep in mind the point of moving to a standard is to achieve
> > > portability so if we do things that will regress us back to being
> > > dependent on an implementation we haven't succeeded. that's all i'm
> trying to guarantee here.
> > Agree. We are trying to solve a problem that is temporary. I am trying to
> keep the problem scope narrow which might help us push to adopt the
> standard sooner.
> 
> i do wish we could just target the standard but unless we are willing to draw a
> line and say no more non std=c11 and also we potentially break the abi we
> are talking years. i don't think it is reasonable to block progress for years, so
> i'm offering a transitional path. it's an evolution over time that we have to
> manage.
Apologies if I am sounding like I am blocking progress. Rest assured, we will find a way. It is just about which solution we are going to pick.
Also, is there are any information on how long before we move to C11?

> 
> >
> > >
> > > i feel like we are really close on this discussion, if we can just
> > > iron this issue out we can probably get going on the actual changes.
> > >
> > > thanks for the consideration.
> > >
> > > >
> > > > >
> > > > > once in place it provides an opportunity to introduce new
> > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > subject to community discussion on how/if/when.
> > > > >
> > > > > it would be good to get more participants into the discussion so
> > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > >
> > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > due to to disadvantages of trying to overlap with the standard
> > > > > namespace while not providing a compatible api/abi and because
> > > > > it provides clear disambiguation of that difference in semantics
> > > > > and compatibility with
> > > the standard api.
> > > > >
> > > > > so far i've noted the following
> > > > >
> > > > > * we will not provide the non-explicit apis.
> > > > +1
> > > >
> > > > > * we will make no attempt to support operate on struct/union atomics
> > > > >   with our apis.
> > > > +1
> > > >
> > > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > > >   - reference the standard api documentation.
> > > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > > >
> > > > > my vote is to remove 'potentially' from the last point above for
> > > > > reasons previously discussed in postings to the mail thread.
> > > > >
> > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > non-explicit apis for viewing.
> > > > >
> > > > > ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/3] ethdev: skip congestion management configuration
  2023-02-11  0:35  3%   ` Ferruh Yigit
@ 2023-02-11  5:16  0%     ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2023-02-11  5:16 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Rakesh Kudurumalla, Ori Kam, Thomas Monjalon, Andrew Rybchenko,
	jerinj, ndabilpuram, dev, David Marchand

On Sat, Feb 11, 2023 at 6:05 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 2/10/2023 8:26 AM, Rakesh Kudurumalla wrote:
> > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> > index b60987db4b..f4eb4232d4 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -2203,6 +2203,17 @@ enum rte_flow_action_type {
> >        */
> >       RTE_FLOW_ACTION_TYPE_DROP,
> >
> > +     /**
> > +      * Skip congestion management configuration
> > +      *
> > +      * Using rte_eth_cman_config_set() API the application
> > +      * can configure ethdev Rx queue's congestion mechanism.
> > +      * Introducing RTE_FLOW_ACTION_TYPE_SKIP_CMAN flow action to skip the
> > +      * congestion configuration applied to the given ethdev Rx queue.
> > +      *
> > +      */
> > +     RTE_FLOW_ACTION_TYPE_SKIP_CMAN,
> > +
>
> Inserting new enum item in to the middle of the enum upsets the ABI
> checks [1], can it go to the end?

Yes.

>
>
>
>
> [1]
> 1 function with some indirect sub-type change:
>
>   [C] 'function size_t rte_flow_copy(rte_flow_desc*, size_t, const
> rte_flow_attr*, const rte_flow_item*, const rte_flow_action*)' at
> rte_flow.c:1092:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_flow_desc*' has sub-type changes:
>       in pointed to type 'struct rte_flow_desc' at rte_flow.h:4326:1:
>         type size hasn't changed
>         1 data member changes (1 filtered):
>           type of 'rte_flow_action* actions' changed:
>             in pointed to type 'struct rte_flow_action' at
> rte_flow.h:3775:1:
>               type size hasn't changed
>               1 data member change:
>                 type of 'rte_flow_action_type type' changed:
>                   type size hasn't changed
>                   1 enumerator insertion:
>
> 'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_SKIP_CMAN' value '8'
>                   50 enumerator changes:
>                     'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_COUNT'
> from value '8' to '9' at rte_flow.h:2216:1
>                     ...

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/3] ethdev: skip congestion management configuration
  @ 2023-02-11  0:35  3%   ` Ferruh Yigit
  2023-02-11  5:16  0%     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-11  0:35 UTC (permalink / raw)
  To: Rakesh Kudurumalla, Ori Kam, Thomas Monjalon, Andrew Rybchenko
  Cc: jerinj, ndabilpuram, dev, David Marchand

On 2/10/2023 8:26 AM, Rakesh Kudurumalla wrote:
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..f4eb4232d4 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -2203,6 +2203,17 @@ enum rte_flow_action_type {
>  	 */
>  	RTE_FLOW_ACTION_TYPE_DROP,
>  
> +	/**
> +	 * Skip congestion management configuration
> +	 *
> +	 * Using rte_eth_cman_config_set() API the application
> +	 * can configure ethdev Rx queue's congestion mechanism.
> +	 * Introducing RTE_FLOW_ACTION_TYPE_SKIP_CMAN flow action to skip the
> +	 * congestion configuration applied to the given ethdev Rx queue.
> +	 *
> +	 */
> +	RTE_FLOW_ACTION_TYPE_SKIP_CMAN,
> +

Inserting new enum item in to the middle of the enum upsets the ABI
checks [1], can it go to the end?




[1]
1 function with some indirect sub-type change:

  [C] 'function size_t rte_flow_copy(rte_flow_desc*, size_t, const
rte_flow_attr*, const rte_flow_item*, const rte_flow_action*)' at
rte_flow.c:1092:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_flow_desc*' has sub-type changes:
      in pointed to type 'struct rte_flow_desc' at rte_flow.h:4326:1:
        type size hasn't changed
        1 data member changes (1 filtered):
          type of 'rte_flow_action* actions' changed:
            in pointed to type 'struct rte_flow_action' at
rte_flow.h:3775:1:
              type size hasn't changed
              1 data member change:
                type of 'rte_flow_action_type type' changed:
                  type size hasn't changed
                  1 enumerator insertion:

'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_SKIP_CMAN' value '8'
                  50 enumerator changes:
                    'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_COUNT'
from value '8' to '9' at rte_flow.h:2216:1
                    ...

^ permalink raw reply	[relevance 3%]

* Re: [RFC PATCH 0/1] Specify C-standard requirement for DPDK builds
  @ 2023-02-10 23:39  4%           ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-10 23:39 UTC (permalink / raw)
  To: Ben Magistro; +Cc: Bruce Richardson, dev, thomas, david.marchand, mb

On Fri, Feb 10, 2023 at 09:52:06AM -0500, Ben Magistro wrote:
> Adding Tyler
> 
> Sort of following along on the RFC: introduce atomics [1] it seems like the
> decision to use 99 vs 11 here could make an impact on the approach taken in
> that thread.

hey Ben thanks for keeping an eye across threads on the topic. the
atomics thread is fairly long but somewhere in it i did provide a
rationale for why we can't just go straight to using C11 even if we
declared that dpdk on supports compilers >= C11.

i wish we could it would certainly make my life way easier if i could
just -std=c11 and cut & paste my way to completion. the reason why we
can't (aside from not requiring C11 compiler as a minimum) is that there
is potential issue with abi compatibility for existing applications
using non-atomic types currently passed to ABI suddenly requiring
standard atomic types. this is because _Atomic type and type are not
guaranteed to have the same size, alignment, representation etc..

anyway, i welcome us establishing c99 as a minimum for all
toolchain/platform combinations.

> 
> 1) http://mails.dpdk.org/archives/dev/2023-February/262042.html
> 
> On Fri, Feb 3, 2023 at 1:00 PM Bruce Richardson <bruce.richardson@intel.com>
> wrote:
> 
> > On Fri, Feb 03, 2023 at 11:45:04AM -0500, Ben Magistro wrote:
> > >    In our case we have other libraries that we are using that have
> > >    required us to specify a minimum c++ version (14/17 most recently for
> > >    one) so it doesn't feel like a big ask/issue to us (provided things
> > >    don't start conflicting...hah; not anticipating any issue).  Our
> > >    software is also used internally so we have a fair bit of control over
> > >    how fast we can adopt changes.
> > >    This got me wondering what some other projects in the DPDK ecosystem
> > >    are saying/doing around language standards/gcc versions.  So some
> > quick
> > >    checking of the projects I am aware of/looked at/using...
> > >    * trex: cannot find an obvious minimum gcc requirement
> > >    * tldk: we are running our own public folk with several fixes, need to
> > >    find time to solve the build sys change aspect to continue providing
> > >    patches upstream; I know I have hit some places where it was easier to
> > >    say the new minimum DPDK version is x at which point you just adopt
> > the
> > >    minimum requirements of DPDK
> > >    * ovs: looks to be comfortable with an older gcc still
> > >    * seastar: seems to be the most aggressive with adopting language
> > >    standards/compilers I've seen [1] and are asking for gcc 9+ and cpp17+
> > >    * ans: based on release 19.02 (2019), they are on gcc >= 5.4 [2] and
> > is
> > >    the same on the main README file
> > >    I do understand the concern, but if no one is voicing an
> > >    opinion/objection does that mean they agree with/will not be affected
> > >    by the change....
> > >    1) [1]https://docs.seastar.io/master/md_compatibility.html
> > >    2) [2]https://github.com/ansyun/dpdk-ans/releases
> > >    Cheers
> > >
> > Thanks for the info.
> > I also notice that since gcc 5, the default language version used - if none
> > is explicitly specified - is gnu11 (or higher for later versions). Clang
> > seems to do something similar, but not sure at what point it started
> > defaulting to a standard >=c11.
> >
> > /Bruce
> >

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
@ 2023-02-10 20:30  3%                       ` Tyler Retzlaff
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-10 20:30 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > For environments where stdatomics are not supported, we
> > > > > > > > > could
> > > > > > have a
> > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > > > support
> > > > > > only
> > > > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > > > > > > > and
> > > > > > when we move
> > > > > > > > to minimum supported standard C11, we just need to get rid
> > > > > > > > of the
> > > > > > file in DPDK
> > > > > > > > repo.
> > > > > > > >
> > > > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > > > introduce names
> > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > >
> > > > > > > > references:
> > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > >  * GNU libc manual
> > > > > > > >
> > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > > > Names.html
> > > > > > > >
> > > > > > > > in effect the header, the names and in some instances
> > > > > > > > namespaces
> > > > > > introduced
> > > > > > > > are reserved by the implementation. there are several
> > > > > > > > reasons in
> > > > > > the GNU libc
> > > > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > >
> > > > > > yeah, i agree they're being a bit wishy washy in the wording,
> > > > > > but i'm not convinced glibc folks are documenting this as
> > > > > > permissive guidance against.
> > > > > >
> > > > > > >
> > > > > > > > manual that explain the justification for these reservations
> > > > > > > > and if
> > > > > > if we think
> > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > >
> > > > > > > > i'll also remark that the inter-mingling of names from the
> > > > > > > > POSIX
> > > > > > standard
> > > > > > > > implicitly exposed as a part of the EAL public API has been
> > > > > > problematic for
> > > > > > > > portability.
> > > > > > > These should be exposed as EAL APIs only when compiled with a
> > > > > > compiler that does not support stdatomics.
> > > > > >
> > > > > > you don't necessarily compile dpdk, the application or its other
> > > > > > dynamically linked dependencies with the same compiler at the
> > > > > > same time.
> > > > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > > > distribution.
> > > > > >
> > > > > > if dpdk is built without real stdatomic types but the
> > > > > > application has to interoperate with a different kit or library
> > > > > > that does they would be forced to dance around dpdk with their
> > > > > > own version of a shim to hide our faked up stdatomics.
> > > > > >
> > > > >
> > > > > So basically, if we want a binary DPDK distribution to be
> > > > > compatible with a
> > > > separate application build environment, they both have to implement
> > > > atomics the same way, i.e. agree on the ABI for atomics.
> > > > >
> > > > > Summing up, this leaves us with only two realistic options:
> > > > >
> > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > build
> > > > environment to support C11 stdatomics.
> > > > > 2. Provide our own DPDK atomics library.
> > > > >
> > > > > (As mentioned by Tyler, the third option - using C11 stdatomics
> > > > > inside DPDK, and requiring a build environment without C11
> > > > > stdatomics to implement a shim - is not realistic!)
> > > > >
> > > > > I strongly want atomics to be available for use across inline and
> > > > > compiled
> > > > code; i.e. it must be possible for both compiled DPDK functions and
> > > > inline functions to perform atomic transactions on the same atomic variable.
> > > >
> > > > i consider it a mandatory requirement. i don't see practically how
> > > > we could withdraw existing use and even if we had clean way i don't
> > > > see why we would want to. so this item is defintely settled if you were
> > concerned.
> > > I think I agree here.
> > >
> > > >
> > > > >
> > > > > So either we upgrade the DPDK build requirements to support C11
> > > > > (including
> > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > >
> > > > i think the issue of requiring a toolchain conformant to a specific
> > > > standard is a separate matter because any adoption of C11 standard
> > > > atomics is a potential abi break from the current use of intrinsics.
> > > I am not sure why you are calling it as ABI break. Referring to [1], I just see
> > wrappers around intrinsics (though [2] does not use the intrinsics).
> > >
> > > [1]
> > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> > > [2]
> > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic
> > > .h
> > 
> > it's a potential abi break because atomic types are not the same types as their
> > corresponding integer types etc.. (or at least are not guaranteed to be by all
> > implementations of c as an abstract language).
> > 
> >     ISO/IEC 9899:2011
> > 
> >     6.2.5 (27)
> >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> >     qualifier designates an atomic type. The size, representation, and alignment
> >     of an atomic type need not be the same as those of the corresponding
> >     unqualified type.
> > 
> >     7.17.6 (3)
> >     NOTE The representation of atomic integer types need not have the same size
> >     as their corresponding regular types. They should have the same size
> > whenever
> >     possible, as it eases effort required to port existing code.
> > 
> > i use the term `potential abi break' with intent because for me to assert in
> > absolute terms i would have to evaluate the implementation of every current
> > and potential future compilers atomic vs non-atomic types. this as i'm sure you
> > understand is not practical, it would also defeat the purpose of moving to a
> > standard. therefore i rely on the specification prescribed by the standard not
> > the detail of a specific implementation.
> Can we say that the platforms 'supported' by DPDK today do not have this problem? Any future platforms that will come to DPDK have to evaluate this.

sadly i don't think we can. i believe in an earlier post i linked a bug
filed on gcc that shows that clang / gcc were producing different
layout than the equivalent non-atomic type.

> 
> > 
> > 
> > > > the abstraction (whatever namespace it resides) allows the existing
> > > > toolchain/platform combinations to maintain compatibility by
> > > > defaulting to current non-standard intrinsics.
> > > How about using the intrinsics (__atomic_xxx) name space for abstraction?
> > This covers the GCC and Clang compilers.

i haven't investigated fully but there are usages of these intrinsics
that indicate there may be undesirable difference between clang and gcc
versions. the hint is there seems to be conditionally compiled code
under __clang__ when using some __atomic's.

for the purpose of this discussion clang just tries to look like gcc so
i don't regard them as being different compilers for the purpose of this
discussion.

> > 
> > the namespace starting with `__` is also reserved for the implementation.
> > this is why compilers gcc/clang/msvc place name their intrinsic and builtin
> > functions starting with __ to explicitly avoid collision with the application
> > namespace.

> Agreed. But, here we are considering '__atomic_' specifically (i.e. not just '__')

i don't understand the confusion __atomic is within the __ namespace
that is reserved.

let me ask this another way, what benefit do you see to trying to
overlap with the standard namespace? the only benefit i can see is that
at some point in the future it avoids having to perform a mechanical
change to eventually retire the abstraction once all platform/toolchains
support standard atomics. i.e. basically s/rte_atomic/atomic/g

is there another benefit i'm missing?

> 
> > 
> >     ISO/IEC 9899:2011
> > 
> >     7.1.3 (1)
> >     All identifiers that begin with an underscore and either an uppercase
> >     letter or another underscore are always reserved for any use.
> > 
> >     ...
> > 
> > > If there is another platform that uses the same name space for something
> > else, I think DPDK should not be supporting that platform.
> > 
> > that's effectively a statement excluding windows platform and all non-gcc
> > compilers from ever supporting dpdk.
> Apologies, I did not understand your comment on windows platform. Do you mean to say a compiler for windows platform uses '__atomic_xxx' name space to provide some other functionality (and hence it would get excluded)? 

i mean dpdk can never fully be supported without msvc except for
statically linked builds which are niche and limit it too severely for
many consumers to practically use dpdk. there are also many application
developers who would like to integrate dpdk but can't and telling them
their only choice is to re-port their entire application to clang isn't
feasible.

i can see no technical reason why we should be excluding a major
compiler in broad use if it is capable of building dpdk. msvc arguably
has some of the most sophisticated security features in the industry
and the use of those features is mandated by many of the customers who
might deploy dpdk applications on windows.

> Clang supports these intrinsics. I am not sure about the merit of supporting other non-gcc compilers. May be a topic Techboard discussion.
> 
> > 
> > > What problems do you see?
> > 
> > i'm fairly certain at least one other compiler uses the __atomic namespace but
> Do you mean __atomic namespace is used for some other purpose?
> 
> > it would take me time to check, the most notable potential issue that comes to
> > mind is if such an intrinsic with the same name is provided in a different
> > implementation and has either regressive code generation or different
> > semantics it would be bad because it is intrinsic you can't just hack around it
> > with #undef __atomic to shim in a semantically correct version.
> I do not think we should worry about regressive code generation problem. It should be fixed by that compiler.
> Different semantics is something we need to worry about. It would be good to find out more about a compiler that does this.

again, this is about portability it's about potential not that we can
find an example.

> 
> > 
> > how about this, is there another possible namespace you might suggest that
> > conforms or doesn't conflict with the the rules defined in ISO/IEC 9899:2011
> > 7.1.3 i think if there were that would satisfy all of my concerns related to
> > namespaces.
> > 
> > keep in mind the point of moving to a standard is to achieve portability so if we
> > do things that will regress us back to being dependent on an implementation
> > we haven't succeeded. that's all i'm trying to guarantee here.
> Agree. We are trying to solve a problem that is temporary. I am trying to keep the problem scope narrow which might help us push to adopt the standard sooner.

i do wish we could just target the standard but unless we are willing to
draw a line and say no more non std=c11 and also we potentially break
the abi we are talking years. i don't think it is reasonable to block
progress for years, so i'm offering a transitional path. it's an
evolution over time that we have to manage.

> 
> > 
> > i feel like we are really close on this discussion, if we can just iron this issue out
> > we can probably get going on the actual changes.
> > 
> > thanks for the consideration.
> > 
> > >
> > > >
> > > > once in place it provides an opportunity to introduce new
> > > > toolchain/platform combinations and enables an opt-in capability to
> > > > use stdatomics on existing toolchain/platform combinations subject
> > > > to community discussion on how/if/when.
> > > >
> > > > it would be good to get more participants into the discussion so
> > > > i'll cc techboard for some attention. i feel like the only area that
> > > > isn't decided is to do or not do this in rte_ namespace.
> > > >
> > > > i'm strongly in favor of rte_ namespace after discussion, mainly due
> > > > to to disadvantages of trying to overlap with the standard namespace
> > > > while not providing a compatible api/abi and because it provides
> > > > clear disambiguation of that difference in semantics and compatibility with
> > the standard api.
> > > >
> > > > so far i've noted the following
> > > >
> > > > * we will not provide the non-explicit apis.
> > > +1
> > >
> > > > * we will make no attempt to support operate on struct/union atomics
> > > >   with our apis.
> > > +1
> > >
> > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > >   - reference the standard api documentation.
> > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > >
> > > > my vote is to remove 'potentially' from the last point above for
> > > > reasons previously discussed in postings to the mail thread.
> > > >
> > > > thanks all for the discussion, i'll send up a patch removing
> > > > non-explicit apis for viewing.
> > > >
> > > > ty

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: improve link speed to string
  @ 2023-02-10 14:41  3%               ` Ferruh Yigit
  2023-03-23 14:40  3%                 ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-10 14:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Min Hu (Connor), Andrew Rybchenko, thomas, dev

On 1/19/2023 4:45 PM, Stephen Hemminger wrote:
> On Thu, 19 Jan 2023 11:41:12 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>>>>> Nothing good will happen if you try to use the function to
>>>>>> print two different link speeds in one log message.  
>>>>> You are right.
>>>>> And use malloc for "name" will result in memory leakage, which is also
>>>>> not a good option.
>>>>>
>>>>> BTW, do you think if we need to modify the function
>>>>> "rte_eth_link_speed_to_str"?  
>>>>
>>>> IMHO it would be more pain than gain in this case.
>>>>
>>>> .
>>>>  
>>> Agree with you. Thanks Andrew
>>>  
>>
>> It can be option to update the API as following in next ABI break release:
>>
>> const char *
>> rte_eth_link_speed_to_str(uint32_t link_speed, char *buf, size_t buf_size);
>>
>> For this a deprecation notice needs to be sent and approved, not sure
>> though if it worth.
>>
>>
>> Meanwhile, what do you think to update string 'Invalid' to something
>> like 'Irregular' or 'Erratic', does this help to convey the right message?
> 
> 
> API versioning is possible here.


Agree, ABI versioning can be used here.

@Connor, what do you think?

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-09 19:44  0%       ` Ferruh Yigit
@ 2023-02-10 14:06  0%         ` Jiawei(Jonny) Wang
  2023-02-14  9:38  0%         ` Jiawei(Jonny) Wang
  1 sibling, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-10 14:06 UTC (permalink / raw)
  To: Ferruh Yigit, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

Hi,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 10, 2023 3:45 AM
> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> > When multiple physical ports are connected to a single DPDK port,
> > (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
> > know which physical port is used for Rx and Tx.
> >
> 
> I assume "kernel bonding" is out of context, but this patch concerns DPDK
> bonding, failsafe or softnic. (I will refer them as virtual bonding
> device.)
> 

''kernel bonding'' can be thought as Linux bonding.

> To use specific queues of the virtual bonding device may interfere with the
> logic of these devices, like bonding modes or RSS of the underlying devices. I
> can see feature focuses on a very specific use case, but not sure if all possible
> side effects taken into consideration.
> 
> 
> And although the feature is only relavent to virtual bondiong device, core
> ethdev structures are updated for this. Most use cases won't need these, so is
> there a way to reduce the scope of the changes to virtual bonding devices?
> 
> 
> There are a few very core ethdev APIs, like:
> rte_eth_dev_configure()
> rte_eth_tx_queue_setup()
> rte_eth_rx_queue_setup()
> rte_eth_dev_start()
> rte_eth_dev_info_get()
> 
> Almost every user of ehtdev uses these APIs, since these are so fundemental I
> am for being a little more conservative on these APIs.
> 
> Every eccentric features are targetting these APIs first because they are
> common and extending them gives an easy solution, but in long run making
> these APIs more complex, harder to maintain and harder for PMDs to support
> them correctly. So I am for not updating them unless it is a generic use case.
> 
> 
> Also as we talked about PMDs supporting them, I assume your coming PMD
> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
> What will happen for other NICs? Will they silently ignore the config option
> from user? So this is a problem for the DPDK application portabiltiy.
> 

Yes, the PMD patch is for net/mlx5 only, the 'tx_phy_affinity' can be used for HW to
choose an mapping queue with physical port.

Other NICs ignore this new configuration for now, or we should add checking in queue setup?

> 
> 
> As far as I understand target is application controlling which sub-device is used
> under the virtual bonding device, can you pleaes give more information why
> this is required, perhaps it can help to provide a better/different solution.
> Like adding the ability to use both bonding device and sub-device for data path,
> this way application can use whichever it wants. (this is just first solution I
> come with, I am not suggesting as replacement solution, but if you can describe
> the problem more I am sure other people can come with better solutions.)
> 

For example: 
There're two physical ports (assume device interface: eth2, eth3), and bonded these two
Devices into one interface (assume bond0).
DPDK application probed/attached the bond0 only (dpdk port id:0),  while sending traffic from dpdk port,
We want to know the packet be sent into which physical port (eth2 or eth3).

With the new configuration, the queue could be configured with underlay device,
Then DPDK application could send the traffic into correct queue as desired.

Add all devices into DPDK, means that need to create multiple RX/TX Queue resources on it.


> And isn't this against the applicatio transparent to underneath device being
> bonding device or actual device?
> 
> 
> > This patch maps a DPDK Tx queue with a physical port, by adding
> > tx_phy_affinity setting in Tx queue.
> > The affinity number is the physical port ID where packets will be
> > sent.
> > Value 0 means no affinity and traffic could be routed to any connected
> > physical ports, this is the default current behavior.
> >
> > The number of physical ports is reported with rte_eth_dev_info_get().
> >
> > The new tx_phy_affinity field is added into the padding hole of
> > rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> > An ABI check rule needs to be added to avoid false warning.
> >
> > Add the testpmd command line:
> > testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >
> > For example, there're two physical ports connected to a single DPDK
> > port (port id 0), and phy_affinity 1 stood for the first physical port
> > and phy_affinity 2 stood for the second physical port.
> > Use the below commands to config tx phy affinity for per Tx Queue:
> >         port config 0 txq 0 phy_affinity 1
> >         port config 0 txq 1 phy_affinity 1
> >         port config 0 txq 2 phy_affinity 2
> >         port config 0 txq 3 phy_affinity 2
> >
> > These commands config the Tx Queue index 0 and Tx Queue index 1 with
> > phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
> > packets will be sent from the first physical port, and similar with
> > the second physical port if sending packets with Tx Queue 2 or Tx
> > Queue 3.
> >
> > Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> > ---
snip

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
@ 2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
  2023-02-10 20:30  3%                       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2023-02-10  5:30 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard, nd

<snip>

> On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > > > >
> > > > > > > >
> > > > > > > > For environments where stdatomics are not supported, we
> > > > > > > > could
> > > > > have a
> > > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > > support
> > > > > only
> > > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > > > > > > and
> > > > > when we move
> > > > > > > to minimum supported standard C11, we just need to get rid
> > > > > > > of the
> > > > > file in DPDK
> > > > > > > repo.
> > > > > > >
> > > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > > introduce names
> > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > >
> > > > > > > references:
> > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > >  * GNU libc manual
> > > > > > >
> > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > > Names.html
> > > > > > >
> > > > > > > in effect the header, the names and in some instances
> > > > > > > namespaces
> > > > > introduced
> > > > > > > are reserved by the implementation. there are several
> > > > > > > reasons in
> > > > > the GNU libc
> > > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > >
> > > > > yeah, i agree they're being a bit wishy washy in the wording,
> > > > > but i'm not convinced glibc folks are documenting this as
> > > > > permissive guidance against.
> > > > >
> > > > > >
> > > > > > > manual that explain the justification for these reservations
> > > > > > > and if
> > > > > if we think
> > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > >
> > > > > > > i'll also remark that the inter-mingling of names from the
> > > > > > > POSIX
> > > > > standard
> > > > > > > implicitly exposed as a part of the EAL public API has been
> > > > > problematic for
> > > > > > > portability.
> > > > > > These should be exposed as EAL APIs only when compiled with a
> > > > > compiler that does not support stdatomics.
> > > > >
> > > > > you don't necessarily compile dpdk, the application or its other
> > > > > dynamically linked dependencies with the same compiler at the
> > > > > same time.
> > > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > > distribution.
> > > > >
> > > > > if dpdk is built without real stdatomic types but the
> > > > > application has to interoperate with a different kit or library
> > > > > that does they would be forced to dance around dpdk with their
> > > > > own version of a shim to hide our faked up stdatomics.
> > > > >
> > > >
> > > > So basically, if we want a binary DPDK distribution to be
> > > > compatible with a
> > > separate application build environment, they both have to implement
> > > atomics the same way, i.e. agree on the ABI for atomics.
> > > >
> > > > Summing up, this leaves us with only two realistic options:
> > > >
> > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > build
> > > environment to support C11 stdatomics.
> > > > 2. Provide our own DPDK atomics library.
> > > >
> > > > (As mentioned by Tyler, the third option - using C11 stdatomics
> > > > inside DPDK, and requiring a build environment without C11
> > > > stdatomics to implement a shim - is not realistic!)
> > > >
> > > > I strongly want atomics to be available for use across inline and
> > > > compiled
> > > code; i.e. it must be possible for both compiled DPDK functions and
> > > inline functions to perform atomic transactions on the same atomic variable.
> > >
> > > i consider it a mandatory requirement. i don't see practically how
> > > we could withdraw existing use and even if we had clean way i don't
> > > see why we would want to. so this item is defintely settled if you were
> concerned.
> > I think I agree here.
> >
> > >
> > > >
> > > > So either we upgrade the DPDK build requirements to support C11
> > > > (including
> > > the optional stdatomics), or we provide our own DPDK atomics.
> > >
> > > i think the issue of requiring a toolchain conformant to a specific
> > > standard is a separate matter because any adoption of C11 standard
> > > atomics is a potential abi break from the current use of intrinsics.
> > I am not sure why you are calling it as ABI break. Referring to [1], I just see
> wrappers around intrinsics (though [2] does not use the intrinsics).
> >
> > [1]
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> > [2]
> > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic
> > .h
> 
> it's a potential abi break because atomic types are not the same types as their
> corresponding integer types etc.. (or at least are not guaranteed to be by all
> implementations of c as an abstract language).
> 
>     ISO/IEC 9899:2011
> 
>     6.2.5 (27)
>     Further, there is the _Atomic qualifier. The presence of the _Atomic
>     qualifier designates an atomic type. The size, representation, and alignment
>     of an atomic type need not be the same as those of the corresponding
>     unqualified type.
> 
>     7.17.6 (3)
>     NOTE The representation of atomic integer types need not have the same size
>     as their corresponding regular types. They should have the same size
> whenever
>     possible, as it eases effort required to port existing code.
> 
> i use the term `potential abi break' with intent because for me to assert in
> absolute terms i would have to evaluate the implementation of every current
> and potential future compilers atomic vs non-atomic types. this as i'm sure you
> understand is not practical, it would also defeat the purpose of moving to a
> standard. therefore i rely on the specification prescribed by the standard not
> the detail of a specific implementation.
Can we say that the platforms 'supported' by DPDK today do not have this problem? Any future platforms that will come to DPDK have to evaluate this.

> 
> 
> > > the abstraction (whatever namespace it resides) allows the existing
> > > toolchain/platform combinations to maintain compatibility by
> > > defaulting to current non-standard intrinsics.
> > How about using the intrinsics (__atomic_xxx) name space for abstraction?
> This covers the GCC and Clang compilers.
> 
> the namespace starting with `__` is also reserved for the implementation.
> this is why compilers gcc/clang/msvc place name their intrinsic and builtin
> functions starting with __ to explicitly avoid collision with the application
> namespace.
Agreed. But, here we are considering '__atomic_' specifically (i.e. not just '__')

> 
>     ISO/IEC 9899:2011
> 
>     7.1.3 (1)
>     All identifiers that begin with an underscore and either an uppercase
>     letter or another underscore are always reserved for any use.
> 
>     ...
> 
> > If there is another platform that uses the same name space for something
> else, I think DPDK should not be supporting that platform.
> 
> that's effectively a statement excluding windows platform and all non-gcc
> compilers from ever supporting dpdk.
Apologies, I did not understand your comment on windows platform. Do you mean to say a compiler for windows platform uses '__atomic_xxx' name space to provide some other functionality (and hence it would get excluded)? 
Clang supports these intrinsics. I am not sure about the merit of supporting other non-gcc compilers. May be a topic Techboard discussion.

> 
> > What problems do you see?
> 
> i'm fairly certain at least one other compiler uses the __atomic namespace but
Do you mean __atomic namespace is used for some other purpose?

> it would take me time to check, the most notable potential issue that comes to
> mind is if such an intrinsic with the same name is provided in a different
> implementation and has either regressive code generation or different
> semantics it would be bad because it is intrinsic you can't just hack around it
> with #undef __atomic to shim in a semantically correct version.
I do not think we should worry about regressive code generation problem. It should be fixed by that compiler.
Different semantics is something we need to worry about. It would be good to find out more about a compiler that does this.

> 
> how about this, is there another possible namespace you might suggest that
> conforms or doesn't conflict with the the rules defined in ISO/IEC 9899:2011
> 7.1.3 i think if there were that would satisfy all of my concerns related to
> namespaces.
> 
> keep in mind the point of moving to a standard is to achieve portability so if we
> do things that will regress us back to being dependent on an implementation
> we haven't succeeded. that's all i'm trying to guarantee here.
Agree. We are trying to solve a problem that is temporary. I am trying to keep the problem scope narrow which might help us push to adopt the standard sooner.

> 
> i feel like we are really close on this discussion, if we can just iron this issue out
> we can probably get going on the actual changes.
> 
> thanks for the consideration.
> 
> >
> > >
> > > once in place it provides an opportunity to introduce new
> > > toolchain/platform combinations and enables an opt-in capability to
> > > use stdatomics on existing toolchain/platform combinations subject
> > > to community discussion on how/if/when.
> > >
> > > it would be good to get more participants into the discussion so
> > > i'll cc techboard for some attention. i feel like the only area that
> > > isn't decided is to do or not do this in rte_ namespace.
> > >
> > > i'm strongly in favor of rte_ namespace after discussion, mainly due
> > > to to disadvantages of trying to overlap with the standard namespace
> > > while not providing a compatible api/abi and because it provides
> > > clear disambiguation of that difference in semantics and compatibility with
> the standard api.
> > >
> > > so far i've noted the following
> > >
> > > * we will not provide the non-explicit apis.
> > +1
> >
> > > * we will make no attempt to support operate on struct/union atomics
> > >   with our apis.
> > +1
> >
> > > * we will mirror the standard api potentially in the rte_ namespace to
> > >   - reference the standard api documentation.
> > >   - assume compatible semantics (sans exceptions from first 2 points).
> > >
> > > my vote is to remove 'potentially' from the last point above for
> > > reasons previously discussed in postings to the mail thread.
> > >
> > > thanks all for the discussion, i'll send up a patch removing
> > > non-explicit apis for viewing.
> > >
> > > ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] eal: introduce atomics abstraction
  2023-02-09 19:19  0%         ` Morten Brørup
@ 2023-02-09 22:04  0%           ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-09 22:04 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

On Thu, Feb 09, 2023 at 08:19:14PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Thursday, 9 February 2023 19.15
> > 
> > On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Wednesday, 8 February 2023 22.44
> > > >
> > > > Introduce atomics abstraction that permits optional use of standard
> > C11
> > > > atomics when meson is provided the new enable_stdatomics=true
> > option.
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > ---
> > >
> > > Looks good. A few minor suggestions about implementation only.
> > >
> > > With or without suggested modifications,
> > >
> > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> [...]
> 
> > > > diff --git a/lib/eal/include/generic/rte_atomic.h
> > > > b/lib/eal/include/generic/rte_atomic.h
> > > > index f5c49a9..392d928 100644
> > > > --- a/lib/eal/include/generic/rte_atomic.h
> > > > +++ b/lib/eal/include/generic/rte_atomic.h
> > > > @@ -110,6 +110,100 @@
> > > >
> > > >  #endif /* __DOXYGEN__ */
> > > >
> > > > +#ifdef RTE_STDC_ATOMICS
> > > > +
> > > > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > > > defined(__STDC_NO_ATOMICS__)
> > > > +#error compiler does not support C11 standard atomics
> > > > +#else
> > > > +#include <stdatomic.h>
> > > > +#endif
> > > > +
> > > > +#define __rte_atomic _Atomic
> > > > +
> > > > +typedef int rte_memory_order;
> > >
> > > I would prefer enum for rte_memory_order:
> > >
> > > typedef enum {
> > >     rte_memory_order_relaxed = memory_order_relaxed,
> > >     rte_memory_order_consume = memory_order_consume,
> > >     rte_memory_order_acquire = memory_order_acquire,
> > >     rte_memory_order_release = memory_order_release,
> > >     rte_memory_order_acq_rel = memory_order_acq_rel,
> > >     rte_memory_order_seq_cst = memory_order_seq_cst
> > > } rte_memory_order;
> > 
> > the reason for not using enum type is abi related. the c standard has
> > this little gem.
> > 
> >     ISO/IEC 9899:2011
> > 
> >     6.7.2.2 (4)
> >     Each enumerated type shall be compatible with char, a signed
> > integer
> >     type, or an unsigned integer type. The choice of type is
> >     implementation-defined, 128) but shall be capable of representing
> > the
> >     values of all the members of the enumeration.
> > 
> >     128) An implementation may delay the choice of which integer type
> > until
> >     all enumeration constants have been seen.
> > 
> > so i'm just being overly protective of maintaining the forward
> > compatibility of the abi.
> > 
> > probably i'm being super unnecessarily cautious in this case since i
> > think
> > in practice even if an implementation chose sizeof(char) i doubt very
> > much
> > that enough enumerated values would get added to this enumeration
> > within
> > the lifetime of the API to suddenly cause the compiler to choose >
> > sizeof(char).
> 
> I am under the impression that compilers usually instantiate enum as int, and you can make it use a smaller size by decorating it with the "packed" attribute - I have benefited from that in the past.

generally i think most implementations choose int as a default but i
think there is potential for char to get used if the compiler is asked
to optimize for size. not a likely optimization preference when using dpdk.

> 
> The only risk I am effectively trying to avoid is someone calling an rte_atomic() function with "order" being another value than one of these values. Probably not ever going to happen.
> 
> Your solution also addresses an academic risk (of the compiler using another type than int for the enum), which will have unwanted side effects - especially if the "order" parameter to the rte_atomic() functions becomes char instead of int.
> 
> I can now conclude that your proposed type (int) is stronger/safer than the type (enum) I suggested. So please keep what you have.
> 
> > 
> > incidentally this is also 
> > 
> 
> [...]
> 
> > > > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > > > desired, success, fail) \
> > > > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > > > fail)
> > >
> > > The type of the "weak" parameter to __atomic_compare_exchange_n() is
> > bool, not int, so use "false" instead of 0. There is probably no
> > practical difference, so I'll leave it up to you.
> > >
> > > You might need to include <stdbool.h> for this... I haven't checked.
> > 
> > strictly speaking you are correct the intrinsic does take bool
> > according
> > to documentation.
> > 
> >     ISO/IEC 9899:2011
> > 
> >     7.18 Boolean type and values <stdbool.h>
> >     (1) The header <stdbool.h> defines four macros.
> >     (2) The macro bool expands to _Bool.
> >     (3) The remaining three macros are suitable for use in #if
> > preprocessing
> > 	directives. They are `true' which expands to the integer constant
> > 1,
> > 	`false' which expands to the integer constant 0, and
> > 	__bool_true_false_are_defined which expands to the integer
> > constant 1.
> 
> Thank you for this reference. I wasn't aware that the two boolean values explicitly expanded to those two integer constants. I had never thought about it, but simply assumed that the constant "true" had the same meaning as "not false", like "if (123)" evaluates "123" as true.
> 
> So I learned something new today.
> 
> > 
> > so i could include the header, to expand a macro which expands to
> > integer constant 0 or 1 as appropriate for weak vs strong. do you think
> > i should? (serious question) if you answer yes, i'll make the change.
> 
> Then no. Please keep the 1's and 0's.

thanks! will leave it as is.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-03 13:33  6%     ` [PATCH v4 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
  2023-02-06 15:29  0%       ` Jiawei(Jonny) Wang
  2023-02-07  9:40  0%       ` Ori Kam
@ 2023-02-09 19:44  0%       ` Ferruh Yigit
  2023-02-10 14:06  0%         ` Jiawei(Jonny) Wang
  2023-02-14  9:38  0%         ` Jiawei(Jonny) Wang
  2 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2023-02-09 19:44 UTC (permalink / raw)
  To: Jiawei Wang, viacheslavo, orika, thomas, andrew.rybchenko,
	Aman Singh, Yuying Zhang
  Cc: dev, rasland

On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.),
> we want to know which physical port is used for Rx and Tx.
> 

I assume "kernel bonding" is out of context, but this patch concerns
DPDK bonding, failsafe or softnic. (I will refer them as virtual bonding
device.)

To use specific queues of the virtual bonding device may interfere with
the logic of these devices, like bonding modes or RSS of the underlying
devices. I can see feature focuses on a very specific use case, but not
sure if all possible side effects taken into consideration.


And although the feature is only relavent to virtual bondiong device,
core ethdev structures are updated for this. Most use cases won't need
these, so is there a way to reduce the scope of the changes to virtual
bonding devices?


There are a few very core ethdev APIs, like:
rte_eth_dev_configure()
rte_eth_tx_queue_setup()
rte_eth_rx_queue_setup()
rte_eth_dev_start()
rte_eth_dev_info_get()

Almost every user of ehtdev uses these APIs, since these are so
fundemental I am for being a little more conservative on these APIs.

Every eccentric features are targetting these APIs first because they
are common and extending them gives an easy solution, but in long run
making these APIs more complex, harder to maintain and harder for PMDs
to support them correctly. So I am for not updating them unless it is a
generic use case.


Also as we talked about PMDs supporting them, I assume your coming PMD
patch will be implementing 'tx_phy_affinity' config option only for mlx
drivers. What will happen for other NICs? Will they silently ignore the
config option from user? So this is a problem for the DPDK application
portabiltiy.



As far as I understand target is application controlling which
sub-device is used under the virtual bonding device, can you pleaes give
more information why this is required, perhaps it can help to provide a
better/different solution.
Like adding the ability to use both bonding device and sub-device for
data path, this way application can use whichever it wants. (this is
just first solution I come with, I am not suggesting as replacement
solution, but if you can describe the problem more I am sure other
people can come with better solutions.)

And isn't this against the applicatio transparent to underneath device
being bonding device or actual device?


> This patch maps a DPDK Tx queue with a physical port,
> by adding tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be
> sent.
> Value 0 means no affinity and traffic could be routed to any
> connected physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of
> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to
> a single DPDK port (port id 0), and phy_affinity 1 stood for
> the first physical port and phy_affinity 2 stood for the second
> physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with
> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
> these packets will be sent from the first physical port, and similar
> with the second physical port if sending packets with Tx Queue 2
> or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---
>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>  app/test-pmd/config.c                       |   1 +
>  devtools/libabigail.abignore                |   5 +
>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>  lib/ethdev/rte_ethdev.h                     |  10 ++
>  6 files changed, 133 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index cb8c174020..f771fcf8ac 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void *parsed_result,
>  
>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> +
> +			"port config (port_id) txq (queue_id) phy_affinity (value)\n"
> +			"    Set the physical affinity value "
> +			"on a specific Tx queue\n\n"
>  		);
>  	}
>  
> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
>  	}
>  };
>  
> +/* *** configure port txq phy_affinity value *** */
> +struct cmd_config_tx_phy_affinity {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t config;
> +	portid_t portid;
> +	cmdline_fixed_string_t txq;
> +	uint16_t qid;
> +	cmdline_fixed_string_t phy_affinity;
> +	uint8_t value;
> +};
> +
> +static void
> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_port *port;
> +	int ret;
> +
> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> +		return;
> +
> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> +		printf("Invalid port id\n");
> +		return;
> +	}
> +
> +	port = &ports[res->portid];
> +
> +	if (strcmp(res->txq, "txq")) {
> +		printf("Unknown parameter\n");
> +		return;
> +	}
> +	if (tx_queue_id_is_invalid(res->qid))
> +		return;
> +
> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> +	if (ret != 0)
> +		return;
> +
> +	if (dev_info.nb_phy_ports == 0) {
> +		printf("Number of physical ports is 0 which is invalid for PHY Affinity\n");
> +		return;
> +	}
> +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
> +	if (dev_info.nb_phy_ports < res->value) {
> +		printf("The PHY affinity value %u is Invalid, exceeds the "
> +		       "number of physical ports\n", res->value);
> +		return;
> +	}
> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> +
> +	cmd_reconfig_device_queue(res->portid, 0, 1);
> +}
> +
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 port, "port");
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 config, "config");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 portid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 txq, "txq");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      qid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 phy_affinity, "phy_affinity");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      value, RTE_UINT8);
> +
> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> +	.f = cmd_config_tx_phy_affinity_parsed,
> +	.data = (void *)0,
> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
> +	.tokens = {
> +		(void *)&cmd_config_tx_phy_affinity_port,
> +		(void *)&cmd_config_tx_phy_affinity_config,
> +		(void *)&cmd_config_tx_phy_affinity_portid,
> +		(void *)&cmd_config_tx_phy_affinity_txq,
> +		(void *)&cmd_config_tx_phy_affinity_qid,
> +		(void *)&cmd_config_tx_phy_affinity_hwport,
> +		(void *)&cmd_config_tx_phy_affinity_value,
> +		NULL,
> +	},
> +};
> +
>  /* ******************************************************************************** */
>  
>  /* list of instructions */
> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>  	NULL,
>  };
>  
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index acccb6b035..b83fb17cfa 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>  		printf("unknown\n");
>  		break;
>  	}
> +	printf("Current number of physical ports: %u\n", dev_info.nb_phy_ports);
>  }
>  
>  void
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 7a93de3ba1..ac7d3fb2da 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -34,3 +34,8 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +; Ignore fields inserted in padding hole of rte_eth_txconf
> +[suppress_type]
> +        name = rte_eth_txconf
> +        has_data_member_inserted_between = {offset_of(tx_deferred_start), offset_of(offloads)}
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index 73f5d94e14..e99bd2dcb6 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>  
> +* **Added affinity for multiple physical ports connected to a single DPDK port.**
> +
> +  * Added Tx affinity in queue setup to map a physical port.
> +
>  * **Updated AMD axgbe driver.**
>  
>    * Added multi-process support.
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 79a1fa9cb7..5c716f7679 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
>  
>  This command should be run when the port is stopped, or else it will fail.
>  
> +config per queue Tx physical affinity
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Configure a per queue physical affinity value only on a specific Tx queue::
> +
> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> +
> +* ``phy_affinity``: physical port to use for sending,
> +                    when multiple physical ports are connected to
> +                    a single DPDK port.
> +
> +This command should be run when the port is stopped, otherwise it fails.
> +
>  Config VXLAN Encap outer layers
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..2fd971b7b5 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>  				      less free descriptors than this value. */
>  
>  	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +	/**
> +	 * Affinity with one of the multiple physical ports connected to the DPDK port.
> +	 * Value 0 means no affinity and traffic could be routed to any connected
> +	 * physical port.
> +	 * The first physical port is number 1 and so on.
> +	 * Number of physical ports is reported by nb_phy_ports in rte_eth_dev_info.
> +	 */
> +	uint8_t tx_phy_affinity;
>  	/**
>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
> @@ -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>  	/** Device redirection table size, the total number of entries. */
>  	uint16_t reta_size;
>  	uint8_t hash_key_size; /**< Hash key size in bytes */
> +	/** Number of physical ports connected with DPDK port. */
> +	uint8_t nb_phy_ports;
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration */


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v2] eal: introduce atomics abstraction
  2023-02-09 18:15  4%       ` Tyler Retzlaff
@ 2023-02-09 19:19  0%         ` Morten Brørup
  2023-02-09 22:04  0%           ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-09 19:19 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Thursday, 9 February 2023 19.15
> 
> On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Wednesday, 8 February 2023 22.44
> > >
> > > Introduce atomics abstraction that permits optional use of standard
> C11
> > > atomics when meson is provided the new enable_stdatomics=true
> option.
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> >
> > Looks good. A few minor suggestions about implementation only.
> >
> > With or without suggested modifications,
> >
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>

[...]

> > > diff --git a/lib/eal/include/generic/rte_atomic.h
> > > b/lib/eal/include/generic/rte_atomic.h
> > > index f5c49a9..392d928 100644
> > > --- a/lib/eal/include/generic/rte_atomic.h
> > > +++ b/lib/eal/include/generic/rte_atomic.h
> > > @@ -110,6 +110,100 @@
> > >
> > >  #endif /* __DOXYGEN__ */
> > >
> > > +#ifdef RTE_STDC_ATOMICS
> > > +
> > > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > > defined(__STDC_NO_ATOMICS__)
> > > +#error compiler does not support C11 standard atomics
> > > +#else
> > > +#include <stdatomic.h>
> > > +#endif
> > > +
> > > +#define __rte_atomic _Atomic
> > > +
> > > +typedef int rte_memory_order;
> >
> > I would prefer enum for rte_memory_order:
> >
> > typedef enum {
> >     rte_memory_order_relaxed = memory_order_relaxed,
> >     rte_memory_order_consume = memory_order_consume,
> >     rte_memory_order_acquire = memory_order_acquire,
> >     rte_memory_order_release = memory_order_release,
> >     rte_memory_order_acq_rel = memory_order_acq_rel,
> >     rte_memory_order_seq_cst = memory_order_seq_cst
> > } rte_memory_order;
> 
> the reason for not using enum type is abi related. the c standard has
> this little gem.
> 
>     ISO/IEC 9899:2011
> 
>     6.7.2.2 (4)
>     Each enumerated type shall be compatible with char, a signed
> integer
>     type, or an unsigned integer type. The choice of type is
>     implementation-defined, 128) but shall be capable of representing
> the
>     values of all the members of the enumeration.
> 
>     128) An implementation may delay the choice of which integer type
> until
>     all enumeration constants have been seen.
> 
> so i'm just being overly protective of maintaining the forward
> compatibility of the abi.
> 
> probably i'm being super unnecessarily cautious in this case since i
> think
> in practice even if an implementation chose sizeof(char) i doubt very
> much
> that enough enumerated values would get added to this enumeration
> within
> the lifetime of the API to suddenly cause the compiler to choose >
> sizeof(char).

I am under the impression that compilers usually instantiate enum as int, and you can make it use a smaller size by decorating it with the "packed" attribute - I have benefited from that in the past.

The only risk I am effectively trying to avoid is someone calling an rte_atomic() function with "order" being another value than one of these values. Probably not ever going to happen.

Your solution also addresses an academic risk (of the compiler using another type than int for the enum), which will have unwanted side effects - especially if the "order" parameter to the rte_atomic() functions becomes char instead of int.

I can now conclude that your proposed type (int) is stronger/safer than the type (enum) I suggested. So please keep what you have.

> 
> incidentally this is also why you can't forward declare enums in c.
> 

[...]

> > > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > > desired, success, fail) \
> > > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > > fail)
> >
> > The type of the "weak" parameter to __atomic_compare_exchange_n() is
> bool, not int, so use "false" instead of 0. There is probably no
> practical difference, so I'll leave it up to you.
> >
> > You might need to include <stdbool.h> for this... I haven't checked.
> 
> strictly speaking you are correct the intrinsic does take bool
> according
> to documentation.
> 
>     ISO/IEC 9899:2011
> 
>     7.18 Boolean type and values <stdbool.h>
>     (1) The header <stdbool.h> defines four macros.
>     (2) The macro bool expands to _Bool.
>     (3) The remaining three macros are suitable for use in #if
> preprocessing
> 	directives. They are `true' which expands to the integer constant
> 1,
> 	`false' which expands to the integer constant 0, and
> 	__bool_true_false_are_defined which expands to the integer
> constant 1.

Thank you for this reference. I wasn't aware that the two boolean values explicitly expanded to those two integer constants. I had never thought about it, but simply assumed that the constant "true" had the same meaning as "not false", like "if (123)" evaluates "123" as true.

So I learned something new today.

> 
> so i could include the header, to expand a macro which expands to
> integer constant 0 or 1 as appropriate for weak vs strong. do you think
> i should? (serious question) if you answer yes, i'll make the change.

Then no. Please keep the 1's and 0's.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] eal: introduce atomics abstraction
  @ 2023-02-09 18:15  4%       ` Tyler Retzlaff
  2023-02-09 19:19  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-09 18:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 8 February 2023 22.44
> > 
> > Introduce atomics abstraction that permits optional use of standard C11
> > atomics when meson is provided the new enable_stdatomics=true option.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> 
> Looks good. A few minor suggestions about implementation only.
> 
> With or without suggested modifications,
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> 
> >  config/meson.build                     | 11 ++++
> >  lib/eal/arm/include/rte_atomic_32.h    |  6 ++-
> >  lib/eal/arm/include/rte_atomic_64.h    |  6 ++-
> >  lib/eal/include/generic/rte_atomic.h   | 96
> > +++++++++++++++++++++++++++++++++-
> >  lib/eal/loongarch/include/rte_atomic.h |  6 ++-
> >  lib/eal/ppc/include/rte_atomic.h       |  6 ++-
> >  lib/eal/riscv/include/rte_atomic.h     |  6 ++-
> >  lib/eal/x86/include/rte_atomic.h       |  8 ++-
> >  meson_options.txt                      |  2 +
> >  9 files changed, 139 insertions(+), 8 deletions(-)
> > 
> > diff --git a/config/meson.build b/config/meson.build
> > index 26f3168..25dd628 100644
> > --- a/config/meson.build
> > +++ b/config/meson.build
> > @@ -255,6 +255,17 @@ endif
> >  # add -include rte_config to cflags
> >  add_project_arguments('-include', 'rte_config.h', language: 'c')
> > 
> > +stdc_atomics_enabled = get_option('enable_stdatomics')
> > +dpdk_conf.set('RTE_STDC_ATOMICS', stdc_atomics_enabled)
> > +
> > +if stdc_atomics_enabled
> > +if cc.get_id() == 'gcc' or cc.get_id() == 'clang'
> > +    add_project_arguments('-std=gnu11', language: 'c')
> > +else
> > +    add_project_arguments('-std=c11', language: 'c')
> > +endif
> > +endif
> > +
> >  # enable extra warnings and disable any unwanted warnings
> >  # -Wall is added by default at warning level 1, and -Wextra
> >  # at warning level 2 (DPDK default)
> > diff --git a/lib/eal/arm/include/rte_atomic_32.h
> > b/lib/eal/arm/include/rte_atomic_32.h
> > index c00ab78..7088a12 100644
> > --- a/lib/eal/arm/include/rte_atomic_32.h
> > +++ b/lib/eal/arm/include/rte_atomic_32.h
> > @@ -34,9 +34,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/arm/include/rte_atomic_64.h
> > b/lib/eal/arm/include/rte_atomic_64.h
> > index 6047911..7f02c57 100644
> > --- a/lib/eal/arm/include/rte_atomic_64.h
> > +++ b/lib/eal/arm/include/rte_atomic_64.h
> > @@ -38,9 +38,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------ 128 bit atomic operations -----------------
> > --------*/
> > diff --git a/lib/eal/include/generic/rte_atomic.h
> > b/lib/eal/include/generic/rte_atomic.h
> > index f5c49a9..392d928 100644
> > --- a/lib/eal/include/generic/rte_atomic.h
> > +++ b/lib/eal/include/generic/rte_atomic.h
> > @@ -110,6 +110,100 @@
> > 
> >  #endif /* __DOXYGEN__ */
> > 
> > +#ifdef RTE_STDC_ATOMICS
> > +
> > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > defined(__STDC_NO_ATOMICS__)
> > +#error compiler does not support C11 standard atomics
> > +#else
> > +#include <stdatomic.h>
> > +#endif
> > +
> > +#define __rte_atomic _Atomic
> > +
> > +typedef int rte_memory_order;
> 
> I would prefer enum for rte_memory_order:
> 
> typedef enum {
>     rte_memory_order_relaxed = memory_order_relaxed,
>     rte_memory_order_consume = memory_order_consume,
>     rte_memory_order_acquire = memory_order_acquire,
>     rte_memory_order_release = memory_order_release,
>     rte_memory_order_acq_rel = memory_order_acq_rel,
>     rte_memory_order_seq_cst = memory_order_seq_cst
> } rte_memory_order;

the reason for not using enum type is abi related. the c standard has
this little gem.

    ISO/IEC 9899:2011

    6.7.2.2 (4)
    Each enumerated type shall be compatible with char, a signed integer
    type, or an unsigned integer type. The choice of type is
    implementation-defined, 128) but shall be capable of representing the
    values of all the members of the enumeration.

    128) An implementation may delay the choice of which integer type until
    all enumeration constants have been seen.

so i'm just being overly protective of maintaining the forward
compatibility of the abi.

probably i'm being super unnecessarily cautious in this case since i think
in practice even if an implementation chose sizeof(char) i doubt very much
that enough enumerated values would get added to this enumeration within
the lifetime of the API to suddenly cause the compiler to choose > sizeof(char).

incidentally this is also why you can't forward declare enums in c.

> 
> > +
> > +#define rte_memory_order_relaxed memory_order_relaxed
> > +#define rte_memory_order_consume memory_order_consume
> > +#define rte_memory_order_acquire memory_order_acquire
> > +#define rte_memory_order_release memory_order_release
> > +#define rte_memory_order_acq_rel memory_order_acq_rel
> > +#define rte_memory_order_seq_cst memory_order_seq_cst
> > +
> > +#define rte_atomic_store_explicit(obj, desired, order) \
> > +	atomic_store_explicit(obj, desired, order)
> > +
> > +#define rte_atomic_load_explicit(obj, order) \
> > +	atomic_load_explicit(obj, order)
> > +
> > +#define rte_atomic_exchange_explicit(obj, desired, order) \
> > +	atomic_exchange_explicit(obj, desired, order)
> > +
> > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > desired, success, fail) \
> > +	atomic_compare_exchange_strong_explicit(obj, expected, desired,
> > success, fail)
> > +
> > +#define rte_atomic_compare_exchange_weak_explicit(obj, expected,
> > desired, success, fail) \
> > +	atomic_compare_exchange_weak_explicit(obj, expected, desired,
> > success, fail)
> > +
> > +#define rte_atomic_fetch_add_explicit(obj, arg, order) \
> > +	atomic_fetch_add_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_sub_explicit(obj, arg, order) \
> > +	atomic_fetch_sub_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_or_explicit(obj, arg, order) \
> > +	atomic_fetch_or_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_xor_explicit(obj, arg, order) \
> > +	atomic_fetch_xor_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_and_explicit(obj, arg, order) \
> > +	atomic_fetch_and_explicit(obj, arg, order)
> > +
> > +#else
> > +
> > +#define __rte_atomic
> > +
> > +typedef int rte_memory_order;
> > +
> > +#define rte_memory_order_relaxed __ATOMIC_RELAXED
> > +#define rte_memory_order_consume __ATOMIC_CONSUME
> > +#define rte_memory_order_acquire __ATOMIC_ACQUIRE
> > +#define rte_memory_order_release __ATOMIC_RELEASE
> > +#define rte_memory_order_acq_rel __ATOMIC_ACQ_REL
> > +#define rte_memory_order_seq_cst __ATOMIC_SEQ_CST
> 
> Prefer enum for rte_memory_order:
> 
> typedef enum {
>     rte_memory_order_relaxed = __ATOMIC_RELAXED,
>     rte_memory_order_consume = __ATOMIC_CONSUME,
>     rte_memory_order_acquire = __ATOMIC_ACQUIRE,
>     rte_memory_order_release = __ATOMIC_RELEASE,
>     rte_memory_order_acq_rel = __ATOMIC_ACQ_REL,
>     rte_memory_order_seq_cst = __ATOMIC_SEQ_CST
> } rte_memory_order;
> 
> > +
> > +#define rte_atomic_store_explicit(obj, desired, order) \
> > +	__atomic_store_n(obj, desired, order)
> > +
> > +#define rte_atomic_load_explicit(obj, order) \
> > +	__atomic_load_n(obj, order)
> > +
> > +#define rte_atomic_exchange_explicit(obj, desired, order) \
> > +	__atomic_exchange_n(obj, desired, order)
> > +
> > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > desired, success, fail) \
> > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > fail)
> 
> The type of the "weak" parameter to __atomic_compare_exchange_n() is bool, not int, so use "false" instead of 0. There is probably no practical difference, so I'll leave it up to you.
> 
> You might need to include <stdbool.h> for this... I haven't checked.

strictly speaking you are correct the intrinsic does take bool according
to documentation.

    ISO/IEC 9899:2011

    7.18 Boolean type and values <stdbool.h>
    (1) The header <stdbool.h> defines four macros.
    (2) The macro bool expands to _Bool.
    (3) The remaining three macros are suitable for use in #if preprocessing
	directives. They are `true' which expands to the integer constant 1,
	`false' which expands to the integer constant 0, and
	__bool_true_false_are_defined which expands to the integer constant 1.

so i could include the header, to expand a macro which expands to
integer constant 0 or 1 as appropriate for weak vs strong. do you think
i should? (serious question) if you answer yes, i'll make the change.

> 
> > +
> > +#define rte_atomic_compare_exchange_weak_explicit(obj, expected,
> > desired, success, fail) \
> > +	__atomic_compare_exchange_n(obj, expected, desired, 1, success,
> > fail)
> 
> Same as above: Use "true" instead of 1.
> 
> > +
> > +#define rte_atomic_fetch_add_explicit(obj, arg, order) \
> > +	__atomic_fetch_add(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_sub_explicit(obj, arg, order) \
> > +	__atomic_fetch_sub(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_or_explicit(obj, arg, order) \
> > +	__atomic_fetch_or(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_xor_explicit(obj, arg, order) \
> > +	__atomic_fetch_xor(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_and_explicit(obj, arg, order) \
> > +	__atomic_fetch_and(obj, arg, order)
> > +
> > +#endif
> > +
> >  /**
> >   * Compiler barrier.
> >   *
> > @@ -123,7 +217,7 @@
> >  /**
> >   * Synchronization fence between threads based on the specified memory
> > order.
> >   */
> > -static inline void rte_atomic_thread_fence(int memorder);
> > +static inline void rte_atomic_thread_fence(rte_memory_order memorder);
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > 
> > diff --git a/lib/eal/loongarch/include/rte_atomic.h
> > b/lib/eal/loongarch/include/rte_atomic.h
> > index 3c82845..66aa0c8 100644
> > --- a/lib/eal/loongarch/include/rte_atomic.h
> > +++ b/lib/eal/loongarch/include/rte_atomic.h
> > @@ -35,9 +35,13 @@
> >  #define rte_io_rmb()	rte_mb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/ppc/include/rte_atomic.h
> > b/lib/eal/ppc/include/rte_atomic.h
> > index 663b4d3..a428a83 100644
> > --- a/lib/eal/ppc/include/rte_atomic.h
> > +++ b/lib/eal/ppc/include/rte_atomic.h
> > @@ -38,9 +38,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > diff --git a/lib/eal/riscv/include/rte_atomic.h
> > b/lib/eal/riscv/include/rte_atomic.h
> > index 4b4633c..3c203a9 100644
> > --- a/lib/eal/riscv/include/rte_atomic.h
> > +++ b/lib/eal/riscv/include/rte_atomic.h
> > @@ -40,9 +40,13 @@
> >  #define rte_io_rmb()	asm volatile("fence ir, ir" : : : "memory")
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/x86/include/rte_atomic.h
> > b/lib/eal/x86/include/rte_atomic.h
> > index f2ee1a9..02d8b12 100644
> > --- a/lib/eal/x86/include/rte_atomic.h
> > +++ b/lib/eal/x86/include/rte_atomic.h
> > @@ -87,12 +87,16 @@
> >   * used instead.
> >   */
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > -	if (memorder == __ATOMIC_SEQ_CST)
> > +	if (memorder == rte_memory_order_seq_cst)
> >  		rte_smp_mb();
> >  	else
> > +#ifdef RTE_STDC_ATOMICS
> > +		atomic_thread_fence(memorder);
> > +#else
> >  		__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > diff --git a/meson_options.txt b/meson_options.txt
> > index 0852849..acbcbb8 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -46,6 +46,8 @@ option('mbuf_refcnt_atomic', type: 'boolean', value:
> > true, description:
> >         'Atomically access the mbuf refcnt.')
> >  option('platform', type: 'string', value: 'native', description:
> >         'Platform to build, either "native", "generic" or a SoC. Please
> > refer to the Linux build guide for more information.')
> > +option('enable_stdatomics', type: 'boolean', value: false,
> > description:
> > +       'enable use of standard C11 atomics.')
> >  option('enable_trace_fp', type: 'boolean', value: false, description:
> >         'enable fast path trace points.')
> >  option('tests', type: 'boolean', value: true, description:
> > --
> > 1.8.3.1
> > 

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  2023-02-09  8:34  4%                   ` Morten Brørup
@ 2023-02-09 17:30  4%                   ` Tyler Retzlaff
  2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-09 17:30 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we could
> > > > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > support
> > > > only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > > > when we move
> > > > > > to minimum supported standard C11, we just need to get rid of
> > > > > > the
> > > > file in DPDK
> > > > > > repo.
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > introduce names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances namespaces
> > > > introduced
> > > > > > are reserved by the implementation. there are several reasons in
> > > > the GNU libc
> > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > >
> > > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > > i'm not convinced glibc folks are documenting this as permissive
> > > > guidance against.
> > > >
> > > > >
> > > > > > manual that explain the justification for these reservations and
> > > > > > if
> > > > if we think
> > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > >
> > > > > > i'll also remark that the inter-mingling of names from the POSIX
> > > > standard
> > > > > > implicitly exposed as a part of the EAL public API has been
> > > > problematic for
> > > > > > portability.
> > > > > These should be exposed as EAL APIs only when compiled with a
> > > > compiler that does not support stdatomics.
> > > >
> > > > you don't necessarily compile dpdk, the application or its other
> > > > dynamically linked dependencies with the same compiler at the same
> > > > time.
> > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > distribution.
> > > >
> > > > if dpdk is built without real stdatomic types but the application
> > > > has to interoperate with a different kit or library that does they
> > > > would be forced to dance around dpdk with their own version of a
> > > > shim to hide our faked up stdatomics.
> > > >
> > >
> > > So basically, if we want a binary DPDK distribution to be compatible with a
> > separate application build environment, they both have to implement atomics
> > the same way, i.e. agree on the ABI for atomics.
> > >
> > > Summing up, this leaves us with only two realistic options:
> > >
> > > 1. Go all in on C11 stdatomics, also requiring the application build
> > environment to support C11 stdatomics.
> > > 2. Provide our own DPDK atomics library.
> > >
> > > (As mentioned by Tyler, the third option - using C11 stdatomics inside
> > > DPDK, and requiring a build environment without C11 stdatomics to
> > > implement a shim - is not realistic!)
> > >
> > > I strongly want atomics to be available for use across inline and compiled
> > code; i.e. it must be possible for both compiled DPDK functions and inline
> > functions to perform atomic transactions on the same atomic variable.
> > 
> > i consider it a mandatory requirement. i don't see practically how we could
> > withdraw existing use and even if we had clean way i don't see why we would
> > want to. so this item is defintely settled if you were concerned.
> I think I agree here.
> 
> > 
> > >
> > > So either we upgrade the DPDK build requirements to support C11 (including
> > the optional stdatomics), or we provide our own DPDK atomics.
> > 
> > i think the issue of requiring a toolchain conformant to a specific standard is a
> > separate matter because any adoption of C11 standard atomics is a potential
> > abi break from the current use of intrinsics.
> I am not sure why you are calling it as ABI break. Referring to [1], I just see wrappers around intrinsics (though [2] does not use the intrinsics).
> 
> [1] https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> [2] https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic.h

it's a potential abi break because atomic types are not the same types as
their corresponding integer types etc.. (or at least are not guaranteed to
be by all implementations of c as an abstract language).

    ISO/IEC 9899:2011

    6.2.5 (27)
    Further, there is the _Atomic qualifier. The presence of the _Atomic
    qualifier designates an atomic type. The size, representation, and alignment
    of an atomic type need not be the same as those of the corresponding
    unqualified type.

    7.17.6 (3)
    NOTE The representation of atomic integer types need not have the same size
    as their corresponding regular types. They should have the same size whenever
    possible, as it eases effort required to port existing code.

i use the term `potential abi break' with intent because for me to assert
in absolute terms i would have to evaluate the implementation of every
current and potential future compilers atomic vs non-atomic types. this
as i'm sure you understand is not practical, it would also defeat the
purpose of moving to a standard. therefore i rely on the specification
prescribed by the standard not the detail of a specific implementation.


> > the abstraction (whatever namespace it resides) allows the existing
> > toolchain/platform combinations to maintain compatibility by defaulting to
> > current non-standard intrinsics.
> How about using the intrinsics (__atomic_xxx) name space for abstraction? This covers the GCC and Clang compilers.

the namespace starting with `__` is also reserved for the implementation.
this is why compilers gcc/clang/msvc place name their intrinsic and
builtin functions starting with __ to explicitly avoid collision with the
application namespace.

    ISO/IEC 9899:2011

    7.1.3 (1)
    All identifiers that begin with an underscore and either an uppercase
    letter or another underscore are always reserved for any use.

    ...

> If there is another platform that uses the same name space for something else, I think DPDK should not be supporting that platform.

that's effectively a statement excluding windows platform and all
non-gcc compilers from ever supporting dpdk.

> What problems do you see?

i'm fairly certain at least one other compiler uses the __atomic
namespace but it would take me time to check, the most notable potential
issue that comes to mind is if such an intrinsic with the same name is
provided in a different implementation and has either regressive code
generation or different semantics it would be bad because it is
intrinsic you can't just hack around it with #undef __atomic to shim in
a semantically correct version.

how about this, is there another possible namespace you might suggest
that conforms or doesn't conflict with the the rules defined in
ISO/IEC 9899:2011 7.1.3 i think if there were that would satisfy all of
my concerns related to namespaces.

keep in mind the point of moving to a standard is to achieve portability
so if we do things that will regress us back to being dependent on an
implementation we haven't succeeded. that's all i'm trying to guarantee
here.

i feel like we are really close on this discussion, if we can just iron
this issue out we can probably get going on the actual changes.

thanks for the consideration.

> 
> > 
> > once in place it provides an opportunity to introduce new toolchain/platform
> > combinations and enables an opt-in capability to use stdatomics on existing
> > toolchain/platform combinations subject to community discussion on
> > how/if/when.
> > 
> > it would be good to get more participants into the discussion so i'll cc techboard
> > for some attention. i feel like the only area that isn't decided is to do or not do
> > this in rte_ namespace.
> > 
> > i'm strongly in favor of rte_ namespace after discussion, mainly due to to
> > disadvantages of trying to overlap with the standard namespace while not
> > providing a compatible api/abi and because it provides clear disambiguation of
> > that difference in semantics and compatibility with the standard api.
> > 
> > so far i've noted the following
> > 
> > * we will not provide the non-explicit apis.
> +1
> 
> > * we will make no attempt to support operate on struct/union atomics
> >   with our apis.
> +1
> 
> > * we will mirror the standard api potentially in the rte_ namespace to
> >   - reference the standard api documentation.
> >   - assume compatible semantics (sans exceptions from first 2 points).
> > 
> > my vote is to remove 'potentially' from the last point above for reasons
> > previously discussed in postings to the mail thread.
> > 
> > thanks all for the discussion, i'll send up a patch removing non-explicit apis for
> > viewing.
> > 
> > ty

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-06 16:38  0%                           ` Jerin Jacob
@ 2023-02-09 17:00  0%                             ` Naga Harish K, S V
  0 siblings, 0 replies; 200+ results
From: Naga Harish K, S V @ 2023-02-09 17:00 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 6, 2023 10:08 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Mon, Feb 6, 2023 at 11:52 AM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, February 3, 2023 3:15 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, January 30, 2023 8:13 PM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan,
> > > > > > > Jay <jay.jayatheerthan@intel.com>
> > > > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params
> > > > > > > set/get APIs
> > > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > +        */
> > > > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > > > >
> > > > > > > > > > > > > Introduce
> > > > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > > > to
> > > > > > > > > make
> > > > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The reserved fields are not used by the adapter or
> > > application.
> > > > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > > > clear reserved
> > > > > > > fields.
> > > > > > > > > > >
> > > > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > > > fieds in future), the old applicaiton which is not
> > > > > > > > > > > using
> > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may
> > > > > > > > > > > have
> > > > > junk
> > > > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > does it mean, the application doesn't re-compile for
> > > > > > > > > > the new
> > > DPDK?
> > > > > > > > >
> > > > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > > > >
> > > > > > > > > > When some of the reserved fields are used in the
> > > > > > > > > > future, the application
> > > > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > > > As the application also may need to use the newly
> > > > > > > > > > consumed reserved
> > > > > > > > > fields?
> > > > > > > > >
> > > > > > > > > The problematic case is:
> > > > > > > > >
> > > > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > > > params) field needs to work with application of 23.03.
> > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove
> that.
> > > > > > > > >
> > > > > > > >
> > > > > > > > As rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > initializes only
> > > > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > > > >
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero
> > > > > > > all fields, not just reserved field.
> > > > > > > The application calling sequence  is
> > > > > > >
> > > > > > > struct my_config c;
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > > > c.interseted_filed_to_be_updated = val;
> > > > > > >
> > > > > > Can it be done like
> > > > > >         struct my_config c = {0};
> > > > > >         c.interseted_filed_to_be_updated = val; and update
> > > > > > Doxygen comments to recommend above usage to reset all fields?
> > > > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can
> > > > > > be
> > > > > avoided.
> > > > >
> > > > > Better to have a function for documentation clarity. Similar
> > > > > scheme already there in DPDK. See rte_eth_cman_config_init()
> > > > >
> > > > >
> > > >
> > > >
> > > > The reference function rte_eth_cman_config_init() is resetting the
> > > > params
> > > struct and initializing the required params with default values in the pmd
> cb.
> > >
> > > No need for PMD cb.
> > >
> > > > The proposed rte_event_eth_rx_adapter_runtime_params_init () API
> > > > just
> > > needs to reset the params struct. There are no pmd CBs involved.
> > > > Having an API just to reset the struct seems overkill. What do you
> think?
> > >
> > > It is slow path API. Keeping it as function is better. Also, it
> > > helps the documentations of config parm in
> > > rte_event_eth_rx_adapter_runtime_params_config()
> > > like, This structure must be initialized with
> > > rte_event_eth_rx_adapter_runtime_params_init() or so.
> > >
> > >
> >
> > Are there any other reasons to have this API (*params_init()) other than
> documentation?
> 
> Initialization code is segregated for tracking.
> 

The discussed changes are updated in the v3 patchset.

> >
> > >
> > > >
> > > > > >
> > > > > > > Let me share an example and you can tell where is the issue
> > > > > > >
> > > > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all
> 64B.
> > > > > > > 3)There is an application written based on 22.03 which using
> > > > > > > only 8B after calling
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > > > application is calling
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > > > >
> > > > > > > > The old application only tries to set/get previous valid
> > > > > > > > fields and the newly
> > > > > > > used fields may still contain junk value.
> > > > > > > > If the application wants to make use of any the newly used
> > > > > > > > params, the
> > > > > > > application changes are required anyway.
> > > > > > >
> > > > > > > Yes. If application wants to make use of newly added features.
> > > > > > > No need to change if new features are not needed for old
> application.

^ permalink raw reply	[relevance 0%]

* [PATCH v7 2/3] graph: pcap capture for graph nodes
  2023-02-09 10:24  4%       ` [PATCH v7 1/3] " Amit Prakash Shukla
@ 2023-02-09 10:24  2%         ` Amit Prakash Shukla
  0 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:24 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, david.marchand, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix
 
v7:
 - Resending the patch

 app/test/test_graph_perf.c             |   2 +-
 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 216 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 116 +++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 ++
 10 files changed, 388 insertions(+), 4 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 1d065438a6..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -324,7 +324,7 @@ graph_init(const char *gname, uint8_t nb_srcs, uint8_t nb_sinks,
 	char nname[RTE_NODE_NAMESIZE / 2];
 	struct test_node_data *node_data;
 	char *ename[nodes_per_stage];
-	struct rte_graph_param gconf;
+	struct rte_graph_param gconf = {0};
 	const struct rte_memzone *mz;
 	uint8_t total_percent = 0;
 	rte_node_t *src_nodes;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index bb435dde32..328dfd3009 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -87,6 +87,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -119,6 +123,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..9cbd1b8fdb
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <pwd.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..2ec772072c
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v7 1/3] pcapng: comment option support for epb
  2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  2023-02-09  9:56  2%       ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
  2023-02-09 10:03  0%       ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
@ 2023-02-09 10:24  4%       ` Amit Prakash Shukla
  2023-02-09 10:24  2%         ` [PATCH v7 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
  2 siblings, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:24 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger
  Cc: dev, jerinj, david.marchand, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix
 
v7:
 - Resending the patch

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index edba46d1fe..b8429a02f1 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -146,7 +146,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 1fa101c420..bb435dde32 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -116,6 +116,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index ea004939e6..65c8c77fa4 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -527,6 +528,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 86b7996e29..4afdec22ef 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -125,6 +125,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -136,7 +138,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* RE: [PATCH v6 1/4] pcapng: comment option support for epb
  2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  2023-02-09  9:56  2%       ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
@ 2023-02-09 10:03  0%       ` Amit Prakash Shukla
  2023-02-09 10:24  4%       ` [PATCH v7 1/3] " Amit Prakash Shukla
  2 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:03 UTC (permalink / raw)
  To: Amit Prakash Shukla, Reshma Pattan, Stephen Hemminger
  Cc: dev, Jerin Jacob Kollanukkaran, david.marchand

Please ignore this version. I will resend the patch.

> -----Original Message-----
> From: Amit Prakash Shukla <amitprakashs@marvell.com>
> Sent: Thursday, February 9, 2023 3:26 PM
> To: Reshma Pattan <reshma.pattan@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> david.marchand@redhat.com; Amit Prakash Shukla
> <amitprakashs@marvell.com>
> Subject: [PATCH v6 1/4] pcapng: comment option support for epb
> 
> This change enhances rte_pcapng_copy to have comment in enhanced
> packet block.
> 
> Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> ---
> v2:
>  - Fixed code style issue
>  - Fixed CI compilation issue on github-robot
> 
> v3:
>  - Code review suggestion from Stephen
>  - Fixed potential memory leak
> 
> v4:
>  - Code review suggestion from Jerin
> 
> v5:
>  - Code review suggestion from Jerin
> 
> v6:
>  - Squashing test graph param initialize fix
> 
>  app/test/test_pcapng.c                 |  4 ++--
>  doc/guides/rel_notes/release_23_03.rst |  2 ++
>  lib/pcapng/rte_pcapng.c                | 10 +++++++++-
>  lib/pcapng/rte_pcapng.h                |  4 +++-
>  lib/pdump/rte_pdump.c                  |  2 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index
> edba46d1fe..b8429a02f1 100644
> --- a/app/test/test_pcapng.c
> +++ b/app/test/test_pcapng.c
> @@ -146,7 +146,7 @@ test_write_packets(void)
>  		struct rte_mbuf *mc;
> 
>  		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
> -				rte_get_tsc_cycles(), 0);
> +				rte_get_tsc_cycles(), 0, NULL);
>  		if (mc == NULL) {
>  			fprintf(stderr, "Cannot copy packet\n");
>  			return -1;
> @@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
>  		struct rte_mbuf *mc;
> 
>  		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
> -				rte_get_tsc_cycles(), 0);
> +				rte_get_tsc_cycles(), 0, NULL);
>  		if (mc == NULL) {
>  			fprintf(stderr, "Cannot copy packet\n");
>  			return -1;
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index 1fa101c420..bb435dde32 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -116,6 +116,8 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =======================================================
> 
> +* Experimental function ``rte_pcapng_copy`` was updated to support
> +comment
> +  section in enhanced packet block in pcapng library.
> 
>  ABI Changes
>  -----------
> diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index
> ea004939e6..65c8c77fa4 100644
> --- a/lib/pcapng/rte_pcapng.c
> +++ b/lib/pcapng/rte_pcapng.c
> @@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  		const struct rte_mbuf *md,
>  		struct rte_mempool *mp,
>  		uint32_t length, uint64_t cycles,
> -		enum rte_pcapng_direction direction)
> +		enum rte_pcapng_direction direction,
> +		const char *comment)
>  {
>  	struct pcapng_enhance_packet_block *epb;
>  	uint32_t orig_len, data_len, padding, flags; @@ -527,6 +528,9 @@
> rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  	if (rss_hash)
>  		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
> 
> +	if (comment)
> +		optlen += pcapng_optlen(strlen(comment));
> +
>  	/* reserve trailing options and block length */
>  	opt = (struct pcapng_option *)
>  		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t)); @@ -
> 564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  					&hash_opt, sizeof(hash_opt));
>  	}
> 
> +	if (comment)
> +		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
> comment,
> +					strlen(comment));
> +
>  	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
> 
>  	/* Add PCAPNG packet header */
> diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index
> 86b7996e29..4afdec22ef 100644
> --- a/lib/pcapng/rte_pcapng.h
> +++ b/lib/pcapng/rte_pcapng.h
> @@ -125,6 +125,8 @@ enum rte_pcapng_direction {
>   *   The timestamp in TSC cycles.
>   * @param direction
>   *   The direction of the packer: receive, transmit or unknown.
> + * @param comment
> + *   Packet comment.
>   *
>   * @return
>   *   - The pointer to the new mbuf formatted for pcapng_write
> @@ -136,7 +138,7 @@ struct rte_mbuf *
>  rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  		const struct rte_mbuf *m, struct rte_mempool *mp,
>  		uint32_t length, uint64_t timestamp,
> -		enum rte_pcapng_direction direction);
> +		enum rte_pcapng_direction direction, const char
> *comment);
> 
> 
>  /**
> diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index
> a81544cb57..9bc4bab4f2 100644
> --- a/lib/pdump/rte_pdump.c
> +++ b/lib/pdump/rte_pdump.c
> @@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
>  		if (cbs->ver == V2)
>  			p = rte_pcapng_copy(port_id, queue,
>  					    pkts[i], mp, cbs->snaplen,
> -					    ts, direction);
> +					    ts, direction, NULL);
>  		else
>  			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
> 
> --
> 2.25.1


^ permalink raw reply	[relevance 0%]

* [PATCH v6 2/4] graph: pcap capture for graph nodes
  2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
@ 2023-02-09  9:56  2%       ` Amit Prakash Shukla
  2023-02-09 10:03  0%       ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  2023-02-09 10:24  4%       ` [PATCH v7 1/3] " Amit Prakash Shukla
  2 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09  9:56 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, david.marchand, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix

 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 216 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 116 +++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 ++
 9 files changed, 387 insertions(+), 3 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index bb435dde32..328dfd3009 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -87,6 +87,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -119,6 +123,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..9cbd1b8fdb
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <pwd.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..2ec772072c
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v6 1/4] pcapng: comment option support for epb
  2023-02-03  8:19  4%   ` [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
  2023-02-03  8:19  2%     ` [PATCH v5 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
@ 2023-02-09  9:56  4%     ` Amit Prakash Shukla
  2023-02-09  9:56  2%       ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
                         ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09  9:56 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger
  Cc: dev, jerinj, david.marchand, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index edba46d1fe..b8429a02f1 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -146,7 +146,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 1fa101c420..bb435dde32 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -116,6 +116,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index ea004939e6..65c8c77fa4 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -527,6 +528,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 86b7996e29..4afdec22ef 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -125,6 +125,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -136,7 +138,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
@ 2023-02-09  8:34  4%                   ` Morten Brørup
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2023-02-09  8:34 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Tyler Retzlaff
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, techboard, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Thursday, 9 February 2023 01.17
> 
> <snip>
> 
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we
> could
> > > > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > support
> > > > only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> and
> > > > when we move
> > > > > > to minimum supported standard C11, we just need to get rid of
> > > > > > the
> > > > file in DPDK
> > > > > > repo.
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > introduce names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >
> https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances
> namespaces
> > > > introduced
> > > > > > are reserved by the implementation. there are several reasons
> in
> > > > the GNU libc
> > > > > Wouldn't this apply only after the particular APIs were
> introduced?
> > > > i.e. it should not apply if the compiler does not support
> stdatomics.
> > > >
> > > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > > i'm not convinced glibc folks are documenting this as permissive
> > > > guidance against.
> > > >
> > > > >
> > > > > > manual that explain the justification for these reservations
> and
> > > > > > if
> > > > if we think
> > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > >
> > > > > > i'll also remark that the inter-mingling of names from the
> POSIX
> > > > standard
> > > > > > implicitly exposed as a part of the EAL public API has been
> > > > problematic for
> > > > > > portability.
> > > > > These should be exposed as EAL APIs only when compiled with a
> > > > compiler that does not support stdatomics.
> > > >
> > > > you don't necessarily compile dpdk, the application or its other
> > > > dynamically linked dependencies with the same compiler at the
> same
> > > > time.
> > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > distribution.
> > > >
> > > > if dpdk is built without real stdatomic types but the application
> > > > has to interoperate with a different kit or library that does
> they
> > > > would be forced to dance around dpdk with their own version of a
> > > > shim to hide our faked up stdatomics.
> > > >
> > >
> > > So basically, if we want a binary DPDK distribution to be
> compatible with a
> > separate application build environment, they both have to implement
> atomics
> > the same way, i.e. agree on the ABI for atomics.
> > >
> > > Summing up, this leaves us with only two realistic options:
> > >
> > > 1. Go all in on C11 stdatomics, also requiring the application
> build
> > environment to support C11 stdatomics.
> > > 2. Provide our own DPDK atomics library.
> > >
> > > (As mentioned by Tyler, the third option - using C11 stdatomics
> inside
> > > DPDK, and requiring a build environment without C11 stdatomics to
> > > implement a shim - is not realistic!)
> > >
> > > I strongly want atomics to be available for use across inline and
> compiled
> > code; i.e. it must be possible for both compiled DPDK functions and
> inline
> > functions to perform atomic transactions on the same atomic variable.
> >
> > i consider it a mandatory requirement. i don't see practically how we
> could
> > withdraw existing use and even if we had clean way i don't see why we
> would
> > want to. so this item is defintely settled if you were concerned.
> I think I agree here.
> 
> >
> > >
> > > So either we upgrade the DPDK build requirements to support C11
> (including
> > the optional stdatomics), or we provide our own DPDK atomics.
> >
> > i think the issue of requiring a toolchain conformant to a specific
> standard is a
> > separate matter because any adoption of C11 standard atomics is a
> potential
> > abi break from the current use of intrinsics.
> I am not sure why you are calling it as ABI break. Referring to [1], I
> just see wrappers around intrinsics (though [2] does not use the
> intrinsics).
> 
> [1] https://github.com/gcc-
> mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> [2] https://github.com/llvm-
> mirror/clang/blob/master/lib/Headers/stdatomic.h

Good input, Honnappa.

This means that the ABI break is purely academic, and there is no ABI breakage in reality.

Since the underlying implementation is the same, it is perfectly OK to mix C11 and intrinsic atomics, even when the DPDK and the application are built in different environments (with and without C11 atomics, or vice versa).

This eliminates my only remaining practical concern about this approach.

> 
> >
> > the abstraction (whatever namespace it resides) allows the existing
> > toolchain/platform combinations to maintain compatibility by
> defaulting to
> > current non-standard intrinsics.
> How about using the intrinsics (__atomic_xxx) name space for
> abstraction? This covers the GCC and Clang compilers.
> If there is another platform that uses the same name space for
> something else, I think DPDK should not be supporting that platform.
> What problems do you see?
> 
> >
> > once in place it provides an opportunity to introduce new
> toolchain/platform
> > combinations and enables an opt-in capability to use stdatomics on
> existing
> > toolchain/platform combinations subject to community discussion on
> > how/if/when.
> >
> > it would be good to get more participants into the discussion so i'll
> cc techboard
> > for some attention. i feel like the only area that isn't decided is
> to do or not do
> > this in rte_ namespace.
> >
> > i'm strongly in favor of rte_ namespace after discussion, mainly due
> to to
> > disadvantages of trying to overlap with the standard namespace while
> not
> > providing a compatible api/abi and because it provides clear
> disambiguation of
> > that difference in semantics and compatibility with the standard api.
> >
> > so far i've noted the following
> >
> > * we will not provide the non-explicit apis.
> +1
> 
> > * we will make no attempt to support operate on struct/union atomics
> >   with our apis.
> +1
> 
> > * we will mirror the standard api potentially in the rte_ namespace
> to
> >   - reference the standard api documentation.
> >   - assume compatible semantics (sans exceptions from first 2
> points).
> >
> > my vote is to remove 'potentially' from the last point above for
> reasons
> > previously discussed in postings to the mail thread.
> >
> > thanks all for the discussion, i'll send up a patch removing non-
> explicit apis for
> > viewing.
> >
> > ty


^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-08 16:35  4%               ` Tyler Retzlaff
@ 2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  2023-02-09  8:34  4%                   ` Morten Brørup
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2023-02-09  0:16 UTC (permalink / raw)
  To: Tyler Retzlaff, Morten Brørup
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, techboard, nd

<snip>

> > > > >
> > > > > >
> > > > > > For environments where stdatomics are not supported, we could
> > > have a
> > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > support
> > > only
> > > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > > when we move
> > > > > to minimum supported standard C11, we just need to get rid of
> > > > > the
> > > file in DPDK
> > > > > repo.
> > > > >
> > > > > my concern with this is that if we provide a stdatomic.h or
> > > introduce names
> > > > > from stdatomic.h it's a violation of the C standard.
> > > > >
> > > > > references:
> > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > >  * GNU libc manual
> > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > Names.html
> > > > >
> > > > > in effect the header, the names and in some instances namespaces
> > > introduced
> > > > > are reserved by the implementation. there are several reasons in
> > > the GNU libc
> > > > Wouldn't this apply only after the particular APIs were introduced?
> > > i.e. it should not apply if the compiler does not support stdatomics.
> > >
> > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > i'm not convinced glibc folks are documenting this as permissive
> > > guidance against.
> > >
> > > >
> > > > > manual that explain the justification for these reservations and
> > > > > if
> > > if we think
> > > > > about ODR and ABI compatibility we can conceive of others.
> > > > >
> > > > > i'll also remark that the inter-mingling of names from the POSIX
> > > standard
> > > > > implicitly exposed as a part of the EAL public API has been
> > > problematic for
> > > > > portability.
> > > > These should be exposed as EAL APIs only when compiled with a
> > > compiler that does not support stdatomics.
> > >
> > > you don't necessarily compile dpdk, the application or its other
> > > dynamically linked dependencies with the same compiler at the same
> > > time.
> > > i.e. basically the model of any dpdk-dev package on any linux
> > > distribution.
> > >
> > > if dpdk is built without real stdatomic types but the application
> > > has to interoperate with a different kit or library that does they
> > > would be forced to dance around dpdk with their own version of a
> > > shim to hide our faked up stdatomics.
> > >
> >
> > So basically, if we want a binary DPDK distribution to be compatible with a
> separate application build environment, they both have to implement atomics
> the same way, i.e. agree on the ABI for atomics.
> >
> > Summing up, this leaves us with only two realistic options:
> >
> > 1. Go all in on C11 stdatomics, also requiring the application build
> environment to support C11 stdatomics.
> > 2. Provide our own DPDK atomics library.
> >
> > (As mentioned by Tyler, the third option - using C11 stdatomics inside
> > DPDK, and requiring a build environment without C11 stdatomics to
> > implement a shim - is not realistic!)
> >
> > I strongly want atomics to be available for use across inline and compiled
> code; i.e. it must be possible for both compiled DPDK functions and inline
> functions to perform atomic transactions on the same atomic variable.
> 
> i consider it a mandatory requirement. i don't see practically how we could
> withdraw existing use and even if we had clean way i don't see why we would
> want to. so this item is defintely settled if you were concerned.
I think I agree here.

> 
> >
> > So either we upgrade the DPDK build requirements to support C11 (including
> the optional stdatomics), or we provide our own DPDK atomics.
> 
> i think the issue of requiring a toolchain conformant to a specific standard is a
> separate matter because any adoption of C11 standard atomics is a potential
> abi break from the current use of intrinsics.
I am not sure why you are calling it as ABI break. Referring to [1], I just see wrappers around intrinsics (though [2] does not use the intrinsics).

[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
[2] https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic.h

> 
> the abstraction (whatever namespace it resides) allows the existing
> toolchain/platform combinations to maintain compatibility by defaulting to
> current non-standard intrinsics.
How about using the intrinsics (__atomic_xxx) name space for abstraction? This covers the GCC and Clang compilers.
If there is another platform that uses the same name space for something else, I think DPDK should not be supporting that platform.
What problems do you see?

> 
> once in place it provides an opportunity to introduce new toolchain/platform
> combinations and enables an opt-in capability to use stdatomics on existing
> toolchain/platform combinations subject to community discussion on
> how/if/when.
> 
> it would be good to get more participants into the discussion so i'll cc techboard
> for some attention. i feel like the only area that isn't decided is to do or not do
> this in rte_ namespace.
> 
> i'm strongly in favor of rte_ namespace after discussion, mainly due to to
> disadvantages of trying to overlap with the standard namespace while not
> providing a compatible api/abi and because it provides clear disambiguation of
> that difference in semantics and compatibility with the standard api.
> 
> so far i've noted the following
> 
> * we will not provide the non-explicit apis.
+1

> * we will make no attempt to support operate on struct/union atomics
>   with our apis.
+1

> * we will mirror the standard api potentially in the rte_ namespace to
>   - reference the standard api documentation.
>   - assume compatible semantics (sans exceptions from first 2 points).
> 
> my vote is to remove 'potentially' from the last point above for reasons
> previously discussed in postings to the mail thread.
> 
> thanks all for the discussion, i'll send up a patch removing non-explicit apis for
> viewing.
> 
> ty

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-08  8:31  3%             ` Morten Brørup
@ 2023-02-08 16:35  4%               ` Tyler Retzlaff
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-08 16:35 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Honnappa Nagarahalli, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Wed, Feb 08, 2023 at 09:31:32AM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 8 February 2023 02.21
> > 
> > On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > > >
> > > > > > Honnappa, please could you give your view on the future of
> > atomics in
> > > > DPDK?
> > > > > Thanks Thomas, apologies it has taken me a while to get to this
> > discussion.
> > > > >
> > > > > IMO, we do not need DPDK's own abstractions. APIs from
> > stdatomic.h
> > > > (stdatomics as is called here) already serve the purpose. These
> > APIs are well
> > > > understood and documented.
> > > >
> > > > i agree that whatever atomics APIs we advocate for should align
> > with the
> > > > standard C atomics for the reasons you state including implied
> > semantics.
> > > Another point I want to make is, we need 'xxx_explicit' APIs only, as
> > we want memory ordering explicitly provided at each call site. (This
> > can be discussed later).
> > 
> > i don't have any issue with removing the non-explicit versions. they're
> > just just convenience for seq_cst anyway. if people don't want them we
> > don't have to have them.
> 
> I agree with Honnappa on this point.
> 
> The non-explicit versions are for lazy (or not so experienced) developers, and might impact performance if used instead of the correct explicit versions.
> 
> I'm working on porting some of our application code from DPDK's rte_atomic32 operations to modern atomics, and I'm temporarily using acq_rel with a FIXME comment on each operation until I have the overview to determine if another memory order is better for each operation. And if I don't get around to fixing the memory order, it is still a step in the right direct direction to get rid of the old __sync based atomics; and the FIXME's remain to be fixed in a later release.
> 
> So here's an idea: Alternatively to omitting the non-explicit versions, we could include them for application developers, but document them as placeholders for "memory order to be determined later" and emit a warning when used. It might speed up the transition away from old atomic operations. Alternatively, we risk thoughtless use of seq_cst with the explicit versions, which might be difficult to detect in code reviews.

i think it may be cleaner to ust remove the non-explicit versions. if we
are publishing api in the rte_xxx namespace then there are no
pre-existing expectations that they are present.

it also reduces the api surface that eventually gets retired ~years from
now when all ports and compilers in the matrix are std=C11.

i'll update the patch accordingly just so we have a visual.

>  
> Either way, with or without non-explicit versions, is fine with me.
> 
> > 
> > >
> > > >
> > > > >
> > > > > For environments where stdatomics are not supported, we could
> > have a
> > > > stdatomic.h in DPDK implementing the same APIs (we have to support
> > only
> > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > when we move
> > > > to minimum supported standard C11, we just need to get rid of the
> > file in DPDK
> > > > repo.
> > > >
> > > > my concern with this is that if we provide a stdatomic.h or
> > introduce names
> > > > from stdatomic.h it's a violation of the C standard.
> > > >
> > > > references:
> > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > >  * GNU libc manual
> > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > Names.html
> > > >
> > > > in effect the header, the names and in some instances namespaces
> > introduced
> > > > are reserved by the implementation. there are several reasons in
> > the GNU libc
> > > Wouldn't this apply only after the particular APIs were introduced?
> > i.e. it should not apply if the compiler does not support stdatomics.
> > 
> > yeah, i agree they're being a bit wishy washy in the wording, but i'm
> > not convinced glibc folks are documenting this as permissive guidance
> > against.
> > 
> > >
> > > > manual that explain the justification for these reservations and if
> > if we think
> > > > about ODR and ABI compatibility we can conceive of others.
> > > >
> > > > i'll also remark that the inter-mingling of names from the POSIX
> > standard
> > > > implicitly exposed as a part of the EAL public API has been
> > problematic for
> > > > portability.
> > > These should be exposed as EAL APIs only when compiled with a
> > compiler that does not support stdatomics.
> > 
> > you don't necessarily compile dpdk, the application or its other
> > dynamically linked dependencies with the same compiler at the same
> > time.
> > i.e. basically the model of any dpdk-dev package on any linux
> > distribution.
> > 
> > if dpdk is built without real stdatomic types but the application has
> > to
> > interoperate with a different kit or library that does they would be
> > forced
> > to dance around dpdk with their own version of a shim to hide our
> > faked up stdatomics.
> > 
> 
> So basically, if we want a binary DPDK distribution to be compatible with a separate application build environment, they both have to implement atomics the same way, i.e. agree on the ABI for atomics.
> 
> Summing up, this leaves us with only two realistic options:
> 
> 1. Go all in on C11 stdatomics, also requiring the application build environment to support C11 stdatomics.
> 2. Provide our own DPDK atomics library.
> 
> (As mentioned by Tyler, the third option - using C11 stdatomics inside DPDK, and requiring a build environment without C11 stdatomics to implement a shim - is not realistic!)
> 
> I strongly want atomics to be available for use across inline and compiled code; i.e. it must be possible for both compiled DPDK functions and inline functions to perform atomic transactions on the same atomic variable.

i consider it a mandatory requirement. i don't see practically how we
could withdraw existing use and even if we had clean way i don't see why
we would want to. so this item is defintely settled if you were
concerned.

> 
> So either we upgrade the DPDK build requirements to support C11 (including the optional stdatomics), or we provide our own DPDK atomics.

i think the issue of requiring a toolchain conformant to a specific
standard is a separate matter because any adoption of C11 standard
atomics is a potential abi break from the current use of intrinsics.

the abstraction (whatever namespace it resides) allows the existing
toolchain/platform combinations to maintain compatibility by defaulting
to current non-standard intrinsics.

once in place it provides an opportunity to introduce new toolchain/platform
combinations and enables an opt-in capability to use stdatomics on
existing toolchain/platform combinations subject to community discussion
on how/if/when.

it would be good to get more participants into the discussion so i'll cc
techboard for some attention. i feel like the only area that isn't
decided is to do or not do this in rte_ namespace.

i'm strongly in favor of rte_ namespace after discussion, mainly due to
to disadvantages of trying to overlap with the standard namespace while not
providing a compatible api/abi and because it provides clear
disambiguation of that difference in semantics and compatibility with
the standard api.

so far i've noted the following

* we will not provide the non-explicit apis.
* we will make no attempt to support operate on struct/union atomics
  with our apis.
* we will mirror the standard api potentially in the rte_ namespace to
  - reference the standard api documentation.
  - assume compatible semantics (sans exceptions from first 2 points).

my vote is to remove 'potentially' from the last point above for reasons
previously discussed in postings to the mail thread.

thanks all for the discussion, i'll send up a patch removing
non-explicit apis for viewing.

ty

^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-08  1:20  0%           ` Tyler Retzlaff
@ 2023-02-08  8:31  3%             ` Morten Brørup
  2023-02-08 16:35  4%               ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-08  8:31 UTC (permalink / raw)
  To: Tyler Retzlaff, Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 8 February 2023 02.21
> 
> On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > >
> > > > > Honnappa, please could you give your view on the future of
> atomics in
> > > DPDK?
> > > > Thanks Thomas, apologies it has taken me a while to get to this
> discussion.
> > > >
> > > > IMO, we do not need DPDK's own abstractions. APIs from
> stdatomic.h
> > > (stdatomics as is called here) already serve the purpose. These
> APIs are well
> > > understood and documented.
> > >
> > > i agree that whatever atomics APIs we advocate for should align
> with the
> > > standard C atomics for the reasons you state including implied
> semantics.
> > Another point I want to make is, we need 'xxx_explicit' APIs only, as
> we want memory ordering explicitly provided at each call site. (This
> can be discussed later).
> 
> i don't have any issue with removing the non-explicit versions. they're
> just just convenience for seq_cst anyway. if people don't want them we
> don't have to have them.

I agree with Honnappa on this point.

The non-explicit versions are for lazy (or not so experienced) developers, and might impact performance if used instead of the correct explicit versions.

I'm working on porting some of our application code from DPDK's rte_atomic32 operations to modern atomics, and I'm temporarily using acq_rel with a FIXME comment on each operation until I have the overview to determine if another memory order is better for each operation. And if I don't get around to fixing the memory order, it is still a step in the right direct direction to get rid of the old __sync based atomics; and the FIXME's remain to be fixed in a later release.

So here's an idea: Alternatively to omitting the non-explicit versions, we could include them for application developers, but document them as placeholders for "memory order to be determined later" and emit a warning when used. It might speed up the transition away from old atomic operations. Alternatively, we risk thoughtless use of seq_cst with the explicit versions, which might be difficult to detect in code reviews.

Either way, with or without non-explicit versions, is fine with me.

> 
> >
> > >
> > > >
> > > > For environments where stdatomics are not supported, we could
> have a
> > > stdatomic.h in DPDK implementing the same APIs (we have to support
> only
> > > _explicit APIs). This allows the code to use stdatomics APIs and
> when we move
> > > to minimum supported standard C11, we just need to get rid of the
> file in DPDK
> > > repo.
> > >
> > > my concern with this is that if we provide a stdatomic.h or
> introduce names
> > > from stdatomic.h it's a violation of the C standard.
> > >
> > > references:
> > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > >  * GNU libc manual
> > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > Names.html
> > >
> > > in effect the header, the names and in some instances namespaces
> introduced
> > > are reserved by the implementation. there are several reasons in
> the GNU libc
> > Wouldn't this apply only after the particular APIs were introduced?
> i.e. it should not apply if the compiler does not support stdatomics.
> 
> yeah, i agree they're being a bit wishy washy in the wording, but i'm
> not convinced glibc folks are documenting this as permissive guidance
> against.
> 
> >
> > > manual that explain the justification for these reservations and if
> if we think
> > > about ODR and ABI compatibility we can conceive of others.
> > >
> > > i'll also remark that the inter-mingling of names from the POSIX
> standard
> > > implicitly exposed as a part of the EAL public API has been
> problematic for
> > > portability.
> > These should be exposed as EAL APIs only when compiled with a
> compiler that does not support stdatomics.
> 
> you don't necessarily compile dpdk, the application or its other
> dynamically linked dependencies with the same compiler at the same
> time.
> i.e. basically the model of any dpdk-dev package on any linux
> distribution.
> 
> if dpdk is built without real stdatomic types but the application has
> to
> interoperate with a different kit or library that does they would be
> forced
> to dance around dpdk with their own version of a shim to hide our
> faked up stdatomics.
> 

So basically, if we want a binary DPDK distribution to be compatible with a separate application build environment, they both have to implement atomics the same way, i.e. agree on the ABI for atomics.

Summing up, this leaves us with only two realistic options:

1. Go all in on C11 stdatomics, also requiring the application build environment to support C11 stdatomics.
2. Provide our own DPDK atomics library.

(As mentioned by Tyler, the third option - using C11 stdatomics inside DPDK, and requiring a build environment without C11 stdatomics to implement a shim - is not realistic!)

I strongly want atomics to be available for use across inline and compiled code; i.e. it must be possible for both compiled DPDK functions and inline functions to perform atomic transactions on the same atomic variable.

So either we upgrade the DPDK build requirements to support C11 (including the optional stdatomics), or we provide our own DPDK atomics.

> >
> > >
> > > let's discuss this from here. if there's still overwhelming desire
> to go this route
> > > then we'll just do our best.
> > >
> > > ty

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-07 23:34  0%         ` Honnappa Nagarahalli
@ 2023-02-08  1:20  0%           ` Tyler Retzlaff
  2023-02-08  8:31  3%             ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-08  1:20 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, mb, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > >
> > > > Honnappa, please could you give your view on the future of atomics in
> > DPDK?
> > > Thanks Thomas, apologies it has taken me a while to get to this discussion.
> > >
> > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > (stdatomics as is called here) already serve the purpose. These APIs are well
> > understood and documented.
> > 
> > i agree that whatever atomics APIs we advocate for should align with the
> > standard C atomics for the reasons you state including implied semantics.
> Another point I want to make is, we need 'xxx_explicit' APIs only, as we want memory ordering explicitly provided at each call site. (This can be discussed later).

i don't have any issue with removing the non-explicit versions. they're
just just convenience for seq_cst anyway. if people don't want them we
don't have to have them.

> 
> > 
> > >
> > > For environments where stdatomics are not supported, we could have a
> > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > _explicit APIs). This allows the code to use stdatomics APIs and when we move
> > to minimum supported standard C11, we just need to get rid of the file in DPDK
> > repo.
> > 
> > my concern with this is that if we provide a stdatomic.h or introduce names
> > from stdatomic.h it's a violation of the C standard.
> > 
> > references:
> >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> >  * GNU libc manual
> >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > Names.html
> > 
> > in effect the header, the names and in some instances namespaces introduced
> > are reserved by the implementation. there are several reasons in the GNU libc
> Wouldn't this apply only after the particular APIs were introduced? i.e. it should not apply if the compiler does not support stdatomics.

yeah, i agree they're being a bit wishy washy in the wording, but i'm
not convinced glibc folks are documenting this as permissive guidance
against.

> 
> > manual that explain the justification for these reservations and if if we think
> > about ODR and ABI compatibility we can conceive of others.
> > 
> > i'll also remark that the inter-mingling of names from the POSIX standard
> > implicitly exposed as a part of the EAL public API has been problematic for
> > portability.
> These should be exposed as EAL APIs only when compiled with a compiler that does not support stdatomics.

you don't necessarily compile dpdk, the application or its other
dynamically linked dependencies with the same compiler at the same time.
i.e. basically the model of any dpdk-dev package on any linux distribution.

if dpdk is built without real stdatomic types but the application has to
interoperate with a different kit or library that does they would be forced
to dance around dpdk with their own version of a shim to hide our
faked up stdatomics.

> 
> > 
> > let's discuss this from here. if there's still overwhelming desire to go this route
> > then we'll just do our best.
> > 
> > ty

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-01 21:41  3%       ` Tyler Retzlaff
  2023-02-02  8:43  4%         ` Morten Brørup
@ 2023-02-07 23:34  0%         ` Honnappa Nagarahalli
  2023-02-08  1:20  0%           ` Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2023-02-07 23:34 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: thomas, dev, bruce.richardson, mb, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, nd

<snip>

> > >
> > > Honnappa, please could you give your view on the future of atomics in
> DPDK?
> > Thanks Thomas, apologies it has taken me a while to get to this discussion.
> >
> > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> (stdatomics as is called here) already serve the purpose. These APIs are well
> understood and documented.
> 
> i agree that whatever atomics APIs we advocate for should align with the
> standard C atomics for the reasons you state including implied semantics.
Another point I want to make is, we need 'xxx_explicit' APIs only, as we want memory ordering explicitly provided at each call site. (This can be discussed later).

> 
> >
> > For environments where stdatomics are not supported, we could have a
> stdatomic.h in DPDK implementing the same APIs (we have to support only
> _explicit APIs). This allows the code to use stdatomics APIs and when we move
> to minimum supported standard C11, we just need to get rid of the file in DPDK
> repo.
> 
> my concern with this is that if we provide a stdatomic.h or introduce names
> from stdatomic.h it's a violation of the C standard.
> 
> references:
>  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
>  * GNU libc manual
>    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> Names.html
> 
> in effect the header, the names and in some instances namespaces introduced
> are reserved by the implementation. there are several reasons in the GNU libc
Wouldn't this apply only after the particular APIs were introduced? i.e. it should not apply if the compiler does not support stdatomics.

> manual that explain the justification for these reservations and if if we think
> about ODR and ABI compatibility we can conceive of others.
> 
> i'll also remark that the inter-mingling of names from the POSIX standard
> implicitly exposed as a part of the EAL public API has been problematic for
> portability.
These should be exposed as EAL APIs only when compiled with a compiler that does not support stdatomics.

> 
> let's discuss this from here. if there's still overwhelming desire to go this route
> then we'll just do our best.
> 
> ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-07 15:16  0%                 ` Morten Brørup
@ 2023-02-07 21:58  0%                   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-07 21:58 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Bruce Richardson, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Tue, Feb 07, 2023 at 04:16:58PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 3 February 2023 21.49
> > 
> > On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> > > On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > > > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > > > Sent: Wednesday, 1 February 2023 22.41
> > > > > >
> > > > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> > wrote:
> > > > > > >
> > > > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > > > >
> > > > > > > > Honnappa, please could you give your view on the future of
> > atomics
> > > > > > in DPDK?
> > > > > > > Thanks Thomas, apologies it has taken me a while to get to
> > this
> > > > > > discussion.
> > > > > > >
> > > > > > > IMO, we do not need DPDK's own abstractions. APIs from
> > stdatomic.h
> > > > > > (stdatomics as is called here) already serve the purpose. These
> > APIs
> > > > > > are well understood and documented.
> > > > > >
> > > > > > i agree that whatever atomics APIs we advocate for should align
> > with
> > > > > > the
> > > > > > standard C atomics for the reasons you state including implied
> > > > > > semantics.
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we could
> > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > support only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > and when
> > > > > > we move to minimum supported standard C11, we just need to get
> > rid of
> > > > > > the file in DPDK repo.
> > > > >
> > > > > Perhaps we can use something already existing, such as this:
> > > > > https://android.googlesource.com/platform/bionic/+/lollipop-
> > release/libc/include/stdatomic.h
> > > > >
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > introduce
> > > > > > names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances
> > namespaces
> > > > > > introduced
> > > > > > are reserved by the implementation. there are several reasons
> > in the
> > > > > > GNU libc
> > > > > > manual that explain the justification for these reservations
> > and if
> > > > > > if we think about ODR and ABI compatibility we can conceive of
> > others.
> > > > >
> > > > > I we are going to move to C11 soon, I consider the shim interim,
> > and am inclined to ignore these warning factors.
> > > > >
> > > > > If we are not moving to C11 soon, I would consider these
> > disadvantages more seriously.
> > > >
> > > > I think it's reasonable to assume that we are talking years here.
> > > >
> > > > We've had a few discussions about minimum C standard. I think my
> > first
> > > > mailing list exchanges about C99 was almost 2 years ago. Given that
> > we
> > > > still aren't on C99 now (though i know Bruce has a series up)
> > indicates
> > > > that progression to C11 isn't going to happen any time soon and
> > even if
> > > > it was the baseline we still can't just use it (reasons described
> > > > later).
> > > >
> > > > Also, i'll point out that we seem to have accepted moving to C99
> > with
> > > > one of the holdback compilers technically being non-conformant but
> > it
> > > > isn't blocking us because it provides the subset of C99 features
> > without
> > > > being conforming that we happen to be using.
> > > >
> > > What compiler is this? As far as I know, all our currently support
> > > compilers claim to support C99 fully. All should support C11 also,
> > > except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos
> > 7, I
> > > think we can require at minimum a c11 compiler for building DPDK
> > itself.
> > > I'm still a little uncertain about requiring that users build their
> > own
> > > code with -std=c11, though.
> > 
> > perhaps i'm mistaken but it was my understanding that the gcc version
> > on
> > RHEL 7 did not fully conform to C99? maybe i read C99 when it was
> > actually
> > C11.
> 
> RHEL does supports C99, it's C11 that it doesn't support [1].
> 
> [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D8762F@smartserver.smartshare.dk/
> 
> > 
> > regardless, even if every supported compiler for dpdk was C11
> > conformant
> > including stdatomics which are optional we can't just move from
> > intrinsic/builtins to standard C atomics (because of the compatibility
> > and performance issues mentioned previously).
> 
> For example, with C11, you can make structures atomic. And an atomic instance of a type can have a different size than the non-atomic type.
> 
> If do we make a shim, it will have some limitations compared to C11 atomics, e.g. it cannot handle atomic structures.

right, so it "looks" like standard but then doesn't work like standard.

> 
> Either we accept these limitations of the shim, or we use our own namespace. If we accept the limitations, we risk that someone with a C11 build environment uses them anyway, and it will not work in non-C11 build environments. So a shim is not a rose without thorns.

the standard says even integer types can have different alignment and size
so it isn't strictly portable to replace any integer type with the similar
_Atomic type. here is an example of something i would prefer not to have
to navigate which using a shim would sign us up for.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146

> > 
> > so just re-orienting this discussion, the purpose of this abstraction
> > is
> > to allow the optional use of standard C atomics when a conformant
> > compiler
> > is available and satisfactory code is generated for the desired target.
> 
> I think it is more important getting this feature into DPDK than using the C11 stdatomic.h API for atomics in DPDK.
> 
> I don't feel strongly about this API, and will accept either the proposed patch series (with the C11-like API, but rte_ prefixed namespace), or a C11 stdatomic.h API shim.

let's just stay out of the standard namespace, it doesn't buy us the
forward compatibility we want and being explicit in the namespace makes
it obvious that we aren't.

ty

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-03 20:49  0%               ` Tyler Retzlaff
@ 2023-02-07 15:16  0%                 ` Morten Brørup
  2023-02-07 21:58  0%                   ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-07 15:16 UTC (permalink / raw)
  To: Tyler Retzlaff, Bruce Richardson
  Cc: Honnappa Nagarahalli, thomas, dev, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 3 February 2023 21.49
> 
> On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> > On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > > Sent: Wednesday, 1 February 2023 22.41
> > > > >
> > > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> wrote:
> > > > > >
> > > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > > >
> > > > > > > Honnappa, please could you give your view on the future of
> atomics
> > > > > in DPDK?
> > > > > > Thanks Thomas, apologies it has taken me a while to get to
> this
> > > > > discussion.
> > > > > >
> > > > > > IMO, we do not need DPDK's own abstractions. APIs from
> stdatomic.h
> > > > > (stdatomics as is called here) already serve the purpose. These
> APIs
> > > > > are well understood and documented.
> > > > >
> > > > > i agree that whatever atomics APIs we advocate for should align
> with
> > > > > the
> > > > > standard C atomics for the reasons you state including implied
> > > > > semantics.
> > > > >
> > > > > >
> > > > > > For environments where stdatomics are not supported, we could
> have a
> > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> support only
> > > > > _explicit APIs). This allows the code to use stdatomics APIs
> and when
> > > > > we move to minimum supported standard C11, we just need to get
> rid of
> > > > > the file in DPDK repo.
> > > >
> > > > Perhaps we can use something already existing, such as this:
> > > > https://android.googlesource.com/platform/bionic/+/lollipop-
> release/libc/include/stdatomic.h
> > > >
> > > > >
> > > > > my concern with this is that if we provide a stdatomic.h or
> introduce
> > > > > names
> > > > > from stdatomic.h it's a violation of the C standard.
> > > > >
> > > > > references:
> > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > >  * GNU libc manual
> > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > Names.html
> > > > >
> > > > > in effect the header, the names and in some instances
> namespaces
> > > > > introduced
> > > > > are reserved by the implementation. there are several reasons
> in the
> > > > > GNU libc
> > > > > manual that explain the justification for these reservations
> and if
> > > > > if we think about ODR and ABI compatibility we can conceive of
> others.
> > > >
> > > > I we are going to move to C11 soon, I consider the shim interim,
> and am inclined to ignore these warning factors.
> > > >
> > > > If we are not moving to C11 soon, I would consider these
> disadvantages more seriously.
> > >
> > > I think it's reasonable to assume that we are talking years here.
> > >
> > > We've had a few discussions about minimum C standard. I think my
> first
> > > mailing list exchanges about C99 was almost 2 years ago. Given that
> we
> > > still aren't on C99 now (though i know Bruce has a series up)
> indicates
> > > that progression to C11 isn't going to happen any time soon and
> even if
> > > it was the baseline we still can't just use it (reasons described
> > > later).
> > >
> > > Also, i'll point out that we seem to have accepted moving to C99
> with
> > > one of the holdback compilers technically being non-conformant but
> it
> > > isn't blocking us because it provides the subset of C99 features
> without
> > > being conforming that we happen to be using.
> > >
> > What compiler is this? As far as I know, all our currently support
> > compilers claim to support C99 fully. All should support C11 also,
> > except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos
> 7, I
> > think we can require at minimum a c11 compiler for building DPDK
> itself.
> > I'm still a little uncertain about requiring that users build their
> own
> > code with -std=c11, though.
> 
> perhaps i'm mistaken but it was my understanding that the gcc version
> on
> RHEL 7 did not fully conform to C99? maybe i read C99 when it was
> actually
> C11.

RHEL does supports C99, it's C11 that it doesn't support [1].

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D8762F@smartserver.smartshare.dk/

> 
> regardless, even if every supported compiler for dpdk was C11
> conformant
> including stdatomics which are optional we can't just move from
> intrinsic/builtins to standard C atomics (because of the compatibility
> and performance issues mentioned previously).

For example, with C11, you can make structures atomic. And an atomic instance of a type can have a different size than the non-atomic type.

If do we make a shim, it will have some limitations compared to C11 atomics, e.g. it cannot handle atomic structures.

Either we accept these limitations of the shim, or we use our own namespace. If we accept the limitations, we risk that someone with a C11 build environment uses them anyway, and it will not work in non-C11 build environments. So a shim is not a rose without thorns.

> 
> so just re-orienting this discussion, the purpose of this abstraction
> is
> to allow the optional use of standard C atomics when a conformant
> compiler
> is available and satisfactory code is generated for the desired target.

I think it is more important getting this feature into DPDK than using the C11 stdatomic.h API for atomics in DPDK.

I don't feel strongly about this API, and will accept either the proposed patch series (with the C11-like API, but rte_ prefixed namespace), or a C11 stdatomic.h API shim.


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-03 13:33  6%     ` [PATCH v4 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
  2023-02-06 15:29  0%       ` Jiawei(Jonny) Wang
@ 2023-02-07  9:40  0%       ` Ori Kam
  2023-02-09 19:44  0%       ` Ferruh Yigit
  2 siblings, 0 replies; 200+ results
From: Ori Kam @ 2023-02-07  9:40 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh

Hi Jiawei,


> -----Original Message-----
> From: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Sent: Friday, 3 February 2023 15:34
> 
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.),
> we want to know which physical port is used for Rx and Tx.
> 
> This patch maps a DPDK Tx queue with a physical port,
> by adding tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be
> sent.
> Value 0 means no affinity and traffic could be routed to any
> connected physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of
> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to
> a single DPDK port (port id 0), and phy_affinity 1 stood for
> the first physical port and phy_affinity 2 stood for the second
> physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with
> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
> these packets will be sent from the first physical port, and similar
> with the second physical port if sending packets with Tx Queue 2
> or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---
>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>  app/test-pmd/config.c                       |   1 +
>  devtools/libabigail.abignore                |   5 +
>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>  lib/ethdev/rte_ethdev.h                     |  10 ++
>  6 files changed, 133 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index cb8c174020..f771fcf8ac 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
> *parsed_result,
> 
>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> +
> +			"port config (port_id) txq (queue_id) phy_affinity
> (value)\n"
> +			"    Set the physical affinity value "
> +			"on a specific Tx queue\n\n"
>  		);
>  	}
> 
> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
> cmd_show_port_flow_transfer_proxy = {
>  	}
>  };
> 
> +/* *** configure port txq phy_affinity value *** */
> +struct cmd_config_tx_phy_affinity {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t config;
> +	portid_t portid;
> +	cmdline_fixed_string_t txq;
> +	uint16_t qid;
> +	cmdline_fixed_string_t phy_affinity;
> +	uint8_t value;
> +};
> +
> +static void
> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_port *port;
> +	int ret;
> +
> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> +		return;
> +
> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> +		printf("Invalid port id\n");
> +		return;
> +	}
> +
> +	port = &ports[res->portid];
> +
> +	if (strcmp(res->txq, "txq")) {
> +		printf("Unknown parameter\n");
> +		return;
> +	}
> +	if (tx_queue_id_is_invalid(res->qid))
> +		return;
> +
> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> +	if (ret != 0)
> +		return;
> +
> +	if (dev_info.nb_phy_ports == 0) {
> +		printf("Number of physical ports is 0 which is invalid for PHY
> Affinity\n");
> +		return;
> +	}
> +	printf("The number of physical ports is %u\n",
> dev_info.nb_phy_ports);
> +	if (dev_info.nb_phy_ports < res->value) {
> +		printf("The PHY affinity value %u is Invalid, exceeds the "
> +		       "number of physical ports\n", res->value);
> +		return;
> +	}
> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> +
> +	cmd_reconfig_device_queue(res->portid, 0, 1);
> +}
> +
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 port, "port");
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 config, "config");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 portid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 txq, "txq");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      qid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 phy_affinity, "phy_affinity");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      value, RTE_UINT8);
> +
> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> +	.f = cmd_config_tx_phy_affinity_parsed,
> +	.data = (void *)0,
> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity
> <value>",
> +	.tokens = {
> +		(void *)&cmd_config_tx_phy_affinity_port,
> +		(void *)&cmd_config_tx_phy_affinity_config,
> +		(void *)&cmd_config_tx_phy_affinity_portid,
> +		(void *)&cmd_config_tx_phy_affinity_txq,
> +		(void *)&cmd_config_tx_phy_affinity_qid,
> +		(void *)&cmd_config_tx_phy_affinity_hwport,
> +		(void *)&cmd_config_tx_phy_affinity_value,
> +		NULL,
> +	},
> +};
> +
>  /*
> ****************************************************************
> **************** */
> 
>  /* list of instructions */
> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>  	NULL,
>  };
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index acccb6b035..b83fb17cfa 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>  		printf("unknown\n");
>  		break;
>  	}
> +	printf("Current number of physical ports: %u\n",
> dev_info.nb_phy_ports);
>  }
> 
>  void
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 7a93de3ba1..ac7d3fb2da 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -34,3 +34,8 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +; Ignore fields inserted in padding hole of rte_eth_txconf
> +[suppress_type]
> +        name = rte_eth_txconf
> +        has_data_member_inserted_between = {offset_of(tx_deferred_start),
> offset_of(offloads)}
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index 73f5d94e14..e99bd2dcb6 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Added affinity for multiple physical ports connected to a single DPDK
> port.**
> +
> +  * Added Tx affinity in queue setup to map a physical port.
> +
>  * **Updated AMD axgbe driver.**
> 
>    * Added multi-process support.
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 79a1fa9cb7..5c716f7679 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
> on a specific Tx queue::
> 
>  This command should be run when the port is stopped, or else it will fail.
> 
> +config per queue Tx physical affinity
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Configure a per queue physical affinity value only on a specific Tx queue::
> +
> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> +
> +* ``phy_affinity``: physical port to use for sending,
> +                    when multiple physical ports are connected to
> +                    a single DPDK port.
> +
> +This command should be run when the port is stopped, otherwise it fails.
> +
>  Config VXLAN Encap outer layers
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..2fd971b7b5 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>  				      less free descriptors than this value. */
> 
>  	uint8_t tx_deferred_start; /**< Do not start queue with
> rte_eth_dev_start(). */
> +	/**
> +	 * Affinity with one of the multiple physical ports connected to the
> DPDK port.
> +	 * Value 0 means no affinity and traffic could be routed to any
> connected
> +	 * physical port.
> +	 * The first physical port is number 1 and so on.
> +	 * Number of physical ports is reported by nb_phy_ports in
> rte_eth_dev_info.
> +	 */
> +	uint8_t tx_phy_affinity;
>  	/**
>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
> flags.
>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
> @@ -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>  	/** Device redirection table size, the total number of entries. */
>  	uint16_t reta_size;
>  	uint8_t hash_key_size; /**< Hash key size in bytes */
> +	/** Number of physical ports connected with DPDK port. */
> +	uint8_t nb_phy_ports;
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
> */
> --
> 2.18.1

Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-06  6:21  0%                         ` Naga Harish K, S V
@ 2023-02-06 16:38  0%                           ` Jerin Jacob
  2023-02-09 17:00  0%                             ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-06 16:38 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Mon, Feb 6, 2023 at 11:52 AM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, February 3, 2023 3:15 PM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, January 30, 2023 8:13 PM
> > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > <jay.jayatheerthan@intel.com>
> > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > > >
> > > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > > <jay.jayatheerthan@intel.com>
> > > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > > APIs
> > > > > >
> > > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > +        */
> > > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > > >
> > > > > > > > > > > > Introduce
> > > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > > to
> > > > > > > > make
> > > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The reserved fields are not used by the adapter or
> > application.
> > > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > > clear reserved
> > > > > > fields.
> > > > > > > > > >
> > > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > > fieds in future), the old applicaiton which is not using
> > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> > > > junk
> > > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > does it mean, the application doesn't re-compile for the new
> > DPDK?
> > > > > > > >
> > > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > > >
> > > > > > > > > When some of the reserved fields are used in the future,
> > > > > > > > > the application
> > > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > > As the application also may need to use the newly consumed
> > > > > > > > > reserved
> > > > > > > > fields?
> > > > > > > >
> > > > > > > > The problematic case is:
> > > > > > > >
> > > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > > params) field needs to work with application of 23.03.
> > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > > > > >
> > > > > > >
> > > > > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes
> > > > > > > only
> > > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > > >
> > > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > > > > fields, not just reserved field.
> > > > > > The application calling sequence  is
> > > > > >
> > > > > > struct my_config c;
> > > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > > c.interseted_filed_to_be_updated = val;
> > > > > >
> > > > > Can it be done like
> > > > >         struct my_config c = {0};
> > > > >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > > > > comments to recommend above usage to reset all fields?
> > > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> > > > avoided.
> > > >
> > > > Better to have a function for documentation clarity. Similar scheme
> > > > already there in DPDK. See rte_eth_cman_config_init()
> > > >
> > > >
> > >
> > >
> > > The reference function rte_eth_cman_config_init() is resetting the params
> > struct and initializing the required params with default values in the pmd cb.
> >
> > No need for PMD cb.
> >
> > > The proposed rte_event_eth_rx_adapter_runtime_params_init () API just
> > needs to reset the params struct. There are no pmd CBs involved.
> > > Having an API just to reset the struct seems overkill. What do you think?
> >
> > It is slow path API. Keeping it as function is better. Also, it helps the
> > documentations of config parm in
> > rte_event_eth_rx_adapter_runtime_params_config()
> > like, This structure must be initialized with
> > rte_event_eth_rx_adapter_runtime_params_init() or so.
> >
> >
>
> Are there any other reasons to have this API (*params_init()) other than documentation?

Initialization code is segregated for tracking.

>
> >
> > >
> > > > >
> > > > > > Let me share an example and you can tell where is the issue
> > > > > >
> > > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > > > > 3)There is an application written based on 22.03 which using
> > > > > > only 8B after calling
> > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > > application is calling
> > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > > >
> > > > > > > The old application only tries to set/get previous valid
> > > > > > > fields and the newly
> > > > > > used fields may still contain junk value.
> > > > > > > If the application wants to make use of any the newly used
> > > > > > > params, the
> > > > > > application changes are required anyway.
> > > > > >
> > > > > > Yes. If application wants to make use of newly added features.
> > > > > > No need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: Sign changes through function signatures
  2023-02-04  8:09  0%           ` Morten Brørup
@ 2023-02-06 15:57  3%             ` Ben Magistro
  0 siblings, 0 replies; 200+ results
From: Ben Magistro @ 2023-02-06 15:57 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Tyler Retzlaff, Bruce Richardson, Thomas Monjalon, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

[-- Attachment #1: Type: text/plain, Size: 13556 bytes --]

I'm a fan of "just rip the bandaid off" (especially when it's convenient
for me, however it's very possible I will also be the person to bring up
backwards compatibility).  Speaking of backwards compatibility, API/ABI
breakage was semi-recently discussed at the techboard [1].  From the notes
it was not clear to me what level of breakage is going to be acceptable
going forward.  This same question seems likely to apply to discussion
around specifying the c standard [2] though potentially less impactful
based on the recent discussion.  I am also beginning to see how a "-ng"
project happens to simplify making many large breaking changes.

To try and add my thoughts here.  For practical use, I don't believe a
socket or core ID should ever be negative.  I don't believe you should need
more than 4 bits for socket id (personally only aware of 4 socket system
boards), but if we are saying we keep the same memory space (32 bits) there
is no practical reason not to allocate 8 bits which gives you the remaining
24 bits for flags. In this thread, we've already identified three? that
seem useful, flag_unset (possibly regretting this, but can see this flag
being overloaded to indicate both unset and error setting), flag_any_okay,
and flag_none (not entirely clear to me yet how this would be used
differently than any_okay).  To me this really sounds like a struct makes
sense to manage the value + flags associated with it as a unit.

We are now venturing into areas I know I don't have enough knowledge about
to speak authoritatively on.  On the aspect of numa id and socket id, I had
to look this up, but it appears that one socket can have more than one numa
id (AMD Threadripper) associated with it.  I don't have easy access to an
AMD system I can run a `lscpu` on to provide a sample/confirm.  I am also
not sure what if any implications there are for how it is used within this
code base.  From a practical purpose, I believe memory is still associated
with a socket  so numa and socket may be able to be used interchangeably
for this purpose in which case I agree, pick one and standardize on that
term/language throughout the code base, possibly adding a note for future
developers/users.

When talking about core id I believe we need to utilize at least 16 bits of
space as we can have systems with dual AMD 64C/128T which I believe should
show as cores 0-255 today.  I have not looked at that aspect of the code
but see it as closely related to the socket discussion.  If making changes
to one, it is probably worth reviewing the other at the same time.  Very
quickly looking at rte_lcore [3], it seems like we either have a model that
should be followed for sockets (as suggested by Morten) or another case
where a struct may also make more sense to wrap a value and provide flags
versus magic values.

Going back to the ABI/API breakage question...  When quickly looking at the
API today, we have a number of functions that return negative values to
indicate errors.  Using references and structs may simplify that to the
point of return == 0 on success and < 0 on error, possibly with no need to
utilize rte_errno for these functions so that would at least allow for
following the existing model/pattern.  I am probably oversimplifying this
aspect.

I will say, in the case of TLDK, I've had to increase the return size of
some functions to int64_t to allow the return of the maximum value on
success and support returning a negative value on error.  Without looking,
I don't remember if that was in one of our wrappers, internal code, or
public APIs.  Regardless of where it actually is, I did not like this as
there are functions that expect a uint32_t so casts or warning suppression
may still be required in the code base.

1) http://mails.dpdk.org/archives/dev/2023-January/259811.html
2) http://mails.dpdk.org/archives/dev/2023-February/261097.html
3)
https://doc.dpdk.org/api/rte__lcore_8h.html#acbf23499dc0b2d223e4d311ad5f1b04e

On Sat, Feb 4, 2023 at 3:09 AM Morten Brørup <mb@smartsharesystems.com>
wrote:

> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 3 February 2023 23.13
> >
> > On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 2 February 2023 21.45
> > > > >
> > > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > While making some updates to our code base for 22.11.1 that
> > were
> > > > > missed in
> > > > > > > our first pass through, we hit the numa node change[1].  In
> > the
> > > > > process of
> > > > > > > updating our code, we noticed that a couple functions
> > > > > (rx/tx_queue_setup,
> > > > > > > maybe more that we aren't using) state they accept
> > `SOCKET_ID_ANY`
> > > > > but the
> > > > > > > function signature then asks for an unsigned integer while
> > > > > `SOCKET_ID_ANY`
> > > > > > > is `-1`.  Following it through the redirect to the "real"
> > function
> > > > > it also
> > > > > > > asks for an unsigned integer which is then passed on to one
> > or more
> > > > > > > functions asking for an integer.  As an example using the the
> > i40e
> > > > > driver
> > > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> > ultimately
> > > > > calls
> > > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > > `rte_zmalloc_socket`[4]
> > > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > > >
> > > > > > > I guess what I am looking for is clarification on if this is
> > > > > intentional or
> > > > > > > if this is additional cleanup that may need to be
> > completed/be
> > > > > desirable so
> > > > > > > that signs are maintained through the call paths and avoid
> > > > > potentially
> > > > > > > producing sign-conversion warnings.  From the very quick
> > glance I
> > > > > took at
> > > > > > > the i40e driver, it seems these are just passed through to
> > other
> > > > > functions
> > > > > > > and no direct use/manipulation occurs (at least in the
> > mentioned
> > > > > functions).
> > > > > >
> > > > > > i believe this is just sloppyness with sign in our api surface.
> > i too
> > > > > > find it frustrating that use of these api force either explicit
> > > > > > casts or suffer having to suppress warnings.
> > > > > >
> > > > > > in the past examples of this have been cleaned up without full
> > > > > deprecation
> > > > > > notices but there are a lot of instances. i also feel
> > (unpopular
> > > > > opinion)
> > > > > > that for some integer types like this that have constrained
> > range /
> > > > > number
> > > > > > spaces it would be of value to introduce a typedef that can be
> > used
> > > > > > consistently.
> > > > > >
> > > > > > for now you'll just have to add the casts and hopefully in the
> > future
> > > > > we
> > > > > > will fix the api making them unnecessary. of course feel free
> > to
> > > > > submit
> > > > > > patches too, it would be great to have these cleaned up.
> > > > >
> > > > > I agree it should be cleaned up.
> > > > > Those IDs should accept negative values.
> > > > > Not sure which type we should choose (int, int32_t, or a
> > typedef).
> > > >
> > > > Why would we use a signed socket ID? We don't use signed port IDs.
> > To me, unsigned seems the way to go. (A minor detail: With unsigned we
> > can use the entire range of values minus one (for the magic "any"
> > value), whereas with signed we can only use the positive range of
> > values. This detail is completely irrelevant when using 32 bit for
> > socket ID, but could be relevant if using fewer bits.)
> > > >
> > > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> > suffice, like port ID. But reducing from 32 bit would probably cause
> > major ABI breakage.
> > > >
> > > > >
> > > > > Another thing to check is the name of the variable.
> > > > > It should be a socket ID when talking about CPU,
> > > > > and a NUMA node ID when talking about memory.
> > > > >
> > > > > And last but not the least,
> > > > > how can we keep ABI compatibility?
> > > > > I hope we can use function versioning to avoid deprecation and
> > > > > breaking.
> > > > >
> > > > > Trials and suggestions are welcome.
> > > >
> > > > Signedness is not the only problem with the socket ID. The meaning
> > of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> > up, we should consider the need for another magic value SOCKET_ID_NONE
> > for devices connected to the chipset, as discussed in this other email
> > thread [1]. And as discussed there, there are also size problems,
> > because some device structures use 8 bit to hold the socket ID.
> > > >
> > > > And functions should always return -1, never SOCKET_ID_ANY, to
> > indicate error.
> > > >
> > > > [1]:
> > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> > erver.smartshare.dk/
> > > >
> > > > I only bring warnings and complications to the discussion here, no
> > solutions. Sorry! :-(
> > > >
> > >
> > > Personally, I think if we are going to change things, we should do
> > things
> > > properly, especially/even if we are going to have to break ABI or use
> > ABI
> > > compatibility.
> > >
> > > I would suggest rather than a typedef, we should actually wrap the
> > int
> > > value in a struct - for two reasons:
> >
> > >
> > > * it means the compiler will actually error out for us if an int or
> > >   unsigned int is used instead. This allow easier fixing at compile-
> > time
> > >   rather than hoping things are correctly specified in existing code.
> > >
> > > * it allows us to do things like explicitly calling out flags, rather
> > than
> > >   just using magic values. While still keeping the size 32 bits, we
> > can
> > >   have the actual socket value as 16-bits and have flags to indicate:
> > >   - ANY socket, NO socket, INVALID value socket. This could end up
> > being
> > >   useful in many cases, for example, when allocating memory we could
> > >   specify a socket number with the ANY flag, indicating that any
> > socket is
> > >   ok, but we'd ideally prefer the number specified.
> >
> > i'm a fan of this where it makes sense. i did this with rte_thread_t
> > for
> > exactly your first reason. but i did receive resistance from other
> > members of the community. personally i like compilation to fail when i
> > make a mistake.
> >
> > it's definitely way easier to make the argument to do this when the
> > actual valued is opaque. if it isn't i think then we need to provide
> > macro/inline accessors to allow applications do whatever it is they do
> > with the value they carry.
> >
> > i'll also note that this allows you a cheap way to sprinkle extra
> > integrity checking when running functional tests. if you have low
> > performance inline accessors you can do things like enforce the range
> > of
> > values or or that enumerations are part of a set for debug builds.
> >
> > as a side i would also caution while i suggested a typedef i don't mean
> > that everything should be typedef'd especially actual structs that are
> > used like structs. typedefs for things like socket id would
> > unquestionably convey more information and implied semantics to the
> > user
> > of an api than just a standard `int' or whatever. consequently i have
> > found
> > that this lowers mistakes with the use of the api.
>
> Hiding the socket_id in a typedef'd structure seems like shooting sparrows
> with cannons.
>
> DPDK is using a C coding style, where there is a convention for not using
> typedefs:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs
>
> In the tread case, a typedef made sense, because the underlying type can
> differ across O/S'es, and thus should be opaque. Which is in line with the
> coding style.
>
> But I don't think this is the case for socket_id. The socket_id is an
> enumeration type, and all we need is a magic number for the "chipset"
> pseudo-socket. And with that, perhaps some iterator macros to include/omit
> this pseudo-socket, like the lcore_id iterators with and without the main
> lcore.
>
> The mix of signed and unsigned in function signatures (and in the
> definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be
> present in other function signatures; we just happened to run into it for
> the socket_id.
>
> The compiler has flags to warn about mixing signed and unsigned types, so
> we could use that flag to reveal and fix those bugs.
>
> >
> > >
> > > As for socket id, and numa id, I'm not sure we should have different
> > > names/types for the two. For example, for PCI devices, do they need a
> > third
> > > type or are they associated with cores or with memory? The socket id
> > for
> > > the core only matters in terms of data locality, i.e. what memory or
> > cache
> > > location it is in. Therefore, for me, I'd pick one name and stick
> > with it.
> >
> > i think the choice for more than one type vs one type is whether or not
> > they are "the same" number space as opposed to just coincidentally
> > overlapping number spaces.
> >
> > >
> > > /Bruce
>
>

[-- Attachment #2: Type: text/html, Size: 16895 bytes --]

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-03 13:33  6%     ` [PATCH v4 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
@ 2023-02-06 15:29  0%       ` Jiawei(Jonny) Wang
  2023-02-07  9:40  0%       ` Ori Kam
  2023-02-09 19:44  0%       ` Ferruh Yigit
  2 siblings, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-06 15:29 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh

Hi,

@Andrew, @Thomas, @Ori, 

Could you lease help to review the patch?

Thanks.

> -----Original Message-----
> From: Jiawei Wang <jiaweiw@nvidia.com>
> Sent: Friday, February 3, 2023 9:34 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to know
> which physical port is used for Rx and Tx.
> 
> This patch maps a DPDK Tx queue with a physical port, by adding
> tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be sent.
> Value 0 means no affinity and traffic could be routed to any connected
> physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of rte_eth_txconf
> structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to a single DPDK port
> (port id 0), and phy_affinity 1 stood for the first physical port and phy_affinity
> 2 stood for the second physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with phy
> affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these packets will be
> sent from the first physical port, and similar with the second physical port if
> sending packets with Tx Queue 2 or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---

snip

> 2.18.1


^ permalink raw reply	[relevance 0%]

* [PATCH 1/3] net/nfp: remove usage of print statements
  @ 2023-02-06  7:05  8% ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-06  7:05 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, James Hershaw, Chaoyong He

From: James Hershaw <james.hershaw@corigine.com>

Removal of the usage of printf() statements from the nfp PMD in favour
of appropriate RTE logging functions in compliance with the standard.

Debug messages are now logged using the appropriate RTE_LOG functions so
it is no longer necessary to print specific statements when compiled in
with the DEBUG tag, rather log these messages using the appropriate
functions regardless of whether the DEBUG tag is set or not.

Signed-off-by: James Hershaw <james.hershaw@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
---
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 71 +++++++++-------------
 drivers/net/nfp/nfpcore/nfp_cppcore.c      |  3 +-
 drivers/net/nfp/nfpcore/nfp_hwinfo.c       | 17 +++---
 drivers/net/nfp/nfpcore/nfp_mip.c          | 12 ++--
 drivers/net/nfp/nfpcore/nfp_mutex.c        | 10 ++-
 drivers/net/nfp/nfpcore/nfp_nsp.c          | 30 +++++----
 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c     |  4 +-
 drivers/net/nfp/nfpcore/nfp_nsp_eth.c      | 16 ++---
 drivers/net/nfp/nfpcore/nfp_resource.c     |  5 +-
 drivers/net/nfp/nfpcore/nfp_rtsym.c        | 28 +++------
 10 files changed, 87 insertions(+), 109 deletions(-)

diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index 22c8bc4b14..8d7eb96da1 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -34,6 +34,7 @@
 #include <rte_string_fns.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_target.h"
 #include "nfp6000/nfp6000.h"
 
@@ -173,23 +174,17 @@ nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
 		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
 
 		if ((offset & mask) != ((offset + size - 1) & mask)) {
-			printf("BAR%d: Won't use for Fixed mapping\n",
-				bar->index);
-			printf("\t<%#llx,%#llx>, action=%d\n",
-				(unsigned long long)offset,
-				(unsigned long long)(offset + size), act);
-			printf("\tBAR too small (0x%llx).\n",
-				(unsigned long long)mask);
+			PMD_DRV_LOG(ERR, "BAR%d: Won't use for Fixed mapping <%#llx,%#llx>, action=%d BAR too small (0x%llx)",
+				    bar->index, (unsigned long long)offset,
+				    (unsigned long long)(offset + size), act,
+				    (unsigned long long)mask);
 			return -EINVAL;
 		}
 		offset &= mask;
 
-#ifdef DEBUG
-		printf("BAR%d: Created Fixed mapping\n", bar->index);
-		printf("\t%d:%d:%d:0x%#llx-0x%#llx>\n", tgt, act, tok,
-			(unsigned long long)offset,
-			(unsigned long long)(offset + mask));
-#endif
+		PMD_DRV_LOG(DEBUG, "BAR%d: Created Fixed mapping %d:%d:%d:0x%#llx-0x%#llx>",
+			    bar->index, tgt, act, tok, (unsigned long long)offset,
+			    (unsigned long long)(offset + mask));
 
 		bitsize = 40 - 16;
 	} else {
@@ -204,33 +199,27 @@ nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
 		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
 
 		if ((offset & mask) != ((offset + size - 1) & mask)) {
-			printf("BAR%d: Won't use for bulk mapping\n",
-				bar->index);
-			printf("\t<%#llx,%#llx>\n", (unsigned long long)offset,
-				(unsigned long long)(offset + size));
-			printf("\ttarget=%d, token=%d\n", tgt, tok);
-			printf("\tBAR too small (%#llx) - (%#llx != %#llx).\n",
-				(unsigned long long)mask,
-				(unsigned long long)(offset & mask),
-				(unsigned long long)(offset + size - 1) & mask);
-
+			PMD_DRV_LOG(ERR, "BAR%d: Won't use for bulk mapping <%#llx,%#llx> target=%d, token=%d BAR too small (%#llx) - (%#llx != %#llx).",
+				    bar->index, (unsigned long long)offset,
+				    (unsigned long long)(offset + size),
+				    tgt, tok, (unsigned long long)mask,
+				    (unsigned long long)(offset & mask),
+				    (unsigned long long)(offset + size - 1) & mask);
 			return -EINVAL;
 		}
 
 		offset &= mask;
 
-#ifdef DEBUG
-		printf("BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx\n",
-			bar->index, tgt, tok, (unsigned long long)offset,
-			(unsigned long long)(offset + ~mask));
-#endif
+		PMD_DRV_LOG(DEBUG, "BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx",
+			    bar->index, tgt, tok, (unsigned long long)offset,
+			    (unsigned long long)(offset + ~mask));
 
 		bitsize = 40 - 21;
 	}
 
 	if (bar->bitsize < bitsize) {
-		printf("BAR%d: Too small for %d:%d:%d\n", bar->index, tgt, tok,
-			act);
+		PMD_DRV_LOG(ERR, "BAR%d: Too small for %d:%d:%d", bar->index,
+			    tgt, tok, act);
 		return -EINVAL;
 	}
 
@@ -263,9 +252,7 @@ nfp_bar_write(struct nfp_pcie_user *nfp, struct nfp_bar *bar,
 	*(uint32_t *)(bar->csr) = newcfg;
 
 	bar->barcfg = newcfg;
-#ifdef DEBUG
-	printf("BAR%d: updated to 0x%08x\n", bar->index, newcfg);
-#endif
+	PMD_DRV_LOG(DEBUG, "BAR%d: updated to 0x%08x", bar->index, newcfg);
 
 	return 0;
 }
@@ -535,7 +522,7 @@ nfp6000_area_read(struct nfp_cpp_area *area, void *kernel_vaddr,
 
 	/* Unaligned? Translate to an explicit access */
 	if ((priv->offset + offset) & (width - 1)) {
-		printf("aread_read unaligned!!!\n");
+		PMD_DRV_LOG(ERR, "aread_read unaligned!!!");
 		return -EINVAL;
 	}
 
@@ -702,7 +689,7 @@ nfp_acquire_secondary_process_lock(struct nfp_pcie_user *desc)
 	desc->secondary_lock = open(lockfile, O_RDWR | O_CREAT | O_NONBLOCK,
 				    0666);
 	if (desc->secondary_lock < 0) {
-		RTE_LOG(ERR, PMD, "NFP lock for secondary process failed\n");
+		PMD_DRV_LOG(ERR, "NFP lock for secondary process failed");
 		free(lockfile);
 		return desc->secondary_lock;
 	}
@@ -711,7 +698,7 @@ nfp_acquire_secondary_process_lock(struct nfp_pcie_user *desc)
 	lock.l_whence = SEEK_SET;
 	rc = fcntl(desc->secondary_lock, F_SETLK, &lock);
 	if (rc < 0) {
-		RTE_LOG(ERR, PMD, "NFP lock for secondary process failed\n");
+		PMD_DRV_LOG(ERR, "NFP lock for secondary process failed");
 		close(desc->secondary_lock);
 	}
 
@@ -725,7 +712,7 @@ nfp6000_set_model(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 	uint32_t model;
 
 	if (rte_pci_read_config(dev, &model, 4, 0x2e) < 0) {
-		printf("nfp set model failed\n");
+		PMD_DRV_LOG(ERR, "nfp set model failed");
 		return -1;
 	}
 
@@ -741,7 +728,7 @@ nfp6000_set_interface(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 	uint16_t interface;
 
 	if (rte_pci_read_config(dev, &interface, 2, 0x154) < 0) {
-		printf("nfp set interface failed\n");
+		PMD_DRV_LOG(ERR, "nfp set interface failed");
 		return -1;
 	}
 
@@ -760,14 +747,14 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos = rte_pci_find_ext_capability(dev, RTE_PCI_EXT_CAP_ID_DSN);
 	if (pos <= 0) {
-		printf("PCI_EXT_CAP_ID_DSN not found. nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "PCI_EXT_CAP_ID_DSN not found. nfp set serial failed");
 		return -1;
 	} else {
 		pos += 6;
 	}
 
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
@@ -776,7 +763,7 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos += 2;
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
@@ -785,7 +772,7 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos += 2;
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_cppcore.c b/drivers/net/nfp/nfpcore/nfp_cppcore.c
index 37799af558..e1e0a143f9 100644
--- a/drivers/net/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/nfp/nfpcore/nfp_cppcore.c
@@ -15,6 +15,7 @@
 #include <ethdev_pci.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_target.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp6000/nfp_xpb.h"
@@ -701,7 +702,7 @@ nfp_cpp_read(struct nfp_cpp *cpp, uint32_t destination,
 
 	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
 	if (!area) {
-		printf("Area allocation/acquire failed\n");
+		PMD_DRV_LOG(ERR, "Area allocation/acquire failed");
 		return -1;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.c b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
index 9f848bde79..9b66569953 100644
--- a/drivers/net/nfp/nfpcore/nfp_hwinfo.c
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
@@ -20,6 +20,7 @@
 #include <time.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp_resource.h"
 #include "nfp_hwinfo.h"
@@ -40,12 +41,12 @@ nfp_hwinfo_db_walk(struct nfp_hwinfo *hwinfo, uint32_t size)
 	     key = val + strlen(val) + 1) {
 		val = key + strlen(key) + 1;
 		if (val >= end) {
-			printf("Bad HWINFO - overflowing key\n");
+			PMD_DRV_LOG(ERR, "Bad HWINFO - overflowing value");
 			return -EINVAL;
 		}
 
 		if (val + strlen(val) + 1 > end) {
-			printf("Bad HWINFO - overflowing value\n");
+			PMD_DRV_LOG(ERR, "Bad HWINFO - overflowing value");
 			return -EINVAL;
 		}
 	}
@@ -59,7 +60,7 @@ nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
 
 	size = db->size;
 	if (size > len) {
-		printf("Unsupported hwinfo size %u > %u\n", size, len);
+		PMD_DRV_LOG(ERR, "Unsupported hwinfo size %u > %u", size, len);
 		return -EINVAL;
 	}
 
@@ -67,8 +68,8 @@ nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
 	new_crc = nfp_crc32_posix((char *)db, size);
 	crc = (uint32_t *)(db->start + size);
 	if (new_crc != *crc) {
-		printf("Corrupt hwinfo table (CRC mismatch)\n");
-		printf("\tcalculated 0x%x, expected 0x%x\n", new_crc, *crc);
+		PMD_DRV_LOG(ERR, "Corrupt hwinfo table (CRC mismatch) calculated 0x%x, expected 0x%x",
+			    new_crc, *crc);
 		return -EINVAL;
 	}
 
@@ -108,12 +109,12 @@ nfp_hwinfo_try_fetch(struct nfp_cpp *cpp, size_t *cpp_size)
 		goto exit_free;
 
 	header = (void *)db;
-	printf("NFP HWINFO header: %#08x\n", *(uint32_t *)header);
+	PMD_DRV_LOG(DEBUG, "NFP HWINFO header: %#08x", *(uint32_t *)header);
 	if (nfp_hwinfo_is_updating(header))
 		goto exit_free;
 
 	if (header->version != NFP_HWINFO_VERSION_2) {
-		printf("Unknown HWInfo version: 0x%08x\n",
+		PMD_DRV_LOG(DEBUG, "Unknown HWInfo version: 0x%08x",
 			header->version);
 		goto exit_free;
 	}
@@ -145,7 +146,7 @@ nfp_hwinfo_fetch(struct nfp_cpp *cpp, size_t *hwdb_size)
 
 		nanosleep(&wait, NULL);
 		if (count++ > 200) {
-			printf("NFP access error\n");
+			PMD_DRV_LOG(ERR, "NFP access error");
 			return NULL;
 		}
 	}
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.c b/drivers/net/nfp/nfpcore/nfp_mip.c
index c86966df8b..d342bc4141 100644
--- a/drivers/net/nfp/nfpcore/nfp_mip.c
+++ b/drivers/net/nfp/nfpcore/nfp_mip.c
@@ -7,6 +7,7 @@
 #include <rte_byteorder.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_mip.h"
 #include "nfp_nffw.h"
 
@@ -43,18 +44,17 @@ nfp_mip_try_read(struct nfp_cpp *cpp, uint32_t cpp_id, uint64_t addr,
 
 	ret = nfp_cpp_read(cpp, cpp_id, addr, mip, sizeof(*mip));
 	if (ret != sizeof(*mip)) {
-		printf("Failed to read MIP data (%d, %zu)\n",
-			ret, sizeof(*mip));
+		PMD_DRV_LOG(ERR, "Failed to read MIP data (%d, %zu)", ret, sizeof(*mip));
 		return -EIO;
 	}
 	if (mip->signature != NFP_MIP_SIGNATURE) {
-		printf("Incorrect MIP signature (0x%08x)\n",
-			 rte_le_to_cpu_32(mip->signature));
+		PMD_DRV_LOG(ERR, "Incorrect MIP signature (0x%08x)",
+			    rte_le_to_cpu_32(mip->signature));
 		return -EINVAL;
 	}
 	if (mip->mip_version != NFP_MIP_VERSION) {
-		printf("Unsupported MIP version (%d)\n",
-			 rte_le_to_cpu_32(mip->mip_version));
+		PMD_DRV_LOG(ERR, "Unsupported MIP version (%d)",
+			    rte_le_to_cpu_32(mip->mip_version));
 		return -EINVAL;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_mutex.c b/drivers/net/nfp/nfpcore/nfp_mutex.c
index 318c5800d7..de9049c6a0 100644
--- a/drivers/net/nfp/nfpcore/nfp_mutex.c
+++ b/drivers/net/nfp/nfpcore/nfp_mutex.c
@@ -10,6 +10,7 @@
 #include <sched.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 
 #define MUTEX_LOCKED(interface)  ((((uint32_t)(interface)) << 16) | 0x000f)
@@ -265,12 +266,9 @@ nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
 		if (err < 0 && errno != EBUSY)
 			return err;
 		if (time(NULL) >= warn_at) {
-			printf("Warning: waiting for NFP mutex\n");
-			printf("\tusage:%u\n", mutex->usage);
-			printf("\tdepth:%hd]\n", mutex->depth);
-			printf("\ttarget:%d\n", mutex->target);
-			printf("\taddr:%llx\n", mutex->address);
-			printf("\tkey:%08x]\n", mutex->key);
+			PMD_DRV_LOG(ERR, "Warning: waiting for NFP mutex usage:%u depth:%hd] target:%d addr:%llx key:%08x]",
+				    mutex->usage, mutex->depth, mutex->target,
+				    mutex->address, mutex->key);
 			warn_at = time(NULL) + 60;
 		}
 		sched_yield();
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.c b/drivers/net/nfp/nfpcore/nfp_nsp.c
index 876a4017c9..22fb3407c6 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.c
@@ -11,6 +11,7 @@
 #include <rte_common.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp_resource.h"
 
@@ -62,7 +63,7 @@ nfp_nsp_print_extended_error(uint32_t ret_val)
 
 	for (i = 0; i < (int)ARRAY_SIZE(nsp_errors); i++)
 		if (ret_val == (uint32_t)nsp_errors[i].code)
-			printf("err msg: %s\n", nsp_errors[i].msg);
+			PMD_DRV_LOG(ERR, "err msg: %s", nsp_errors[i].msg);
 }
 
 static int
@@ -81,7 +82,7 @@ nfp_nsp_check(struct nfp_nsp *state)
 		return err;
 
 	if (FIELD_GET(NSP_STATUS_MAGIC, reg) != NSP_MAGIC) {
-		printf("Cannot detect NFP Service Processor\n");
+		PMD_DRV_LOG(ERR, "Cannot detect NFP Service Processor");
 		return -ENODEV;
 	}
 
@@ -89,13 +90,13 @@ nfp_nsp_check(struct nfp_nsp *state)
 	state->ver.minor = FIELD_GET(NSP_STATUS_MINOR, reg);
 
 	if (state->ver.major != NSP_MAJOR || state->ver.minor < NSP_MINOR) {
-		printf("Unsupported ABI %hu.%hu\n", state->ver.major,
+		PMD_DRV_LOG(ERR, "Unsupported ABI %hu.%hu", state->ver.major,
 						    state->ver.minor);
 		return -EINVAL;
 	}
 
 	if (reg & NSP_STATUS_BUSY) {
-		printf("Service processor busy!\n");
+		PMD_DRV_LOG(ERR, "Service processor busy!");
 		return -EBUSY;
 	}
 
@@ -223,7 +224,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 
 	if (!FIELD_FIT(NSP_BUFFER_CPP, buff_cpp >> 8) ||
 	    !FIELD_FIT(NSP_BUFFER_ADDRESS, buff_addr)) {
-		printf("Host buffer out of reach %08x %" PRIx64 "\n",
+		PMD_DRV_LOG(ERR, "Host buffer out of reach %08x %" PRIx64,
 			buff_cpp, buff_addr);
 		return -EINVAL;
 	}
@@ -245,7 +246,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_command,
 			       NSP_COMMAND_START, 0);
 	if (err) {
-		printf("Error %d waiting for code 0x%04x to start\n",
+		PMD_DRV_LOG(ERR, "Error %d waiting for code 0x%04x to start",
 			err, code);
 		return err;
 	}
@@ -254,7 +255,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
 			       0);
 	if (err) {
-		printf("Error %d waiting for code 0x%04x to complete\n",
+		PMD_DRV_LOG(ERR, "Error %d waiting for code 0x%04x to start",
 			err, code);
 		return err;
 	}
@@ -266,7 +267,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 
 	err = FIELD_GET(NSP_STATUS_RESULT, reg);
 	if (err) {
-		printf("Result (error) code set: %d (%d) command: %d\n",
+		PMD_DRV_LOG(ERR, "Result (error) code set: %d (%d) command: %d",
 			 -err, (int)ret_val, code);
 		nfp_nsp_print_extended_error(ret_val);
 		return -err;
@@ -289,8 +290,8 @@ nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
 	uint32_t cpp_id;
 
 	if (nsp->ver.minor < 13) {
-		printf("NSP: Code 0x%04x with buffer not supported\n", code);
-		printf("\t(ABI %hu.%hu)\n", nsp->ver.major, nsp->ver.minor);
+		PMD_DRV_LOG(ERR, "NSP: Code 0x%04x with buffer not supported ABI %hu.%hu)",
+			    code, nsp->ver.major, nsp->ver.minor);
 		return -EOPNOTSUPP;
 	}
 
@@ -303,11 +304,8 @@ nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
 
 	max_size = RTE_MAX(in_size, out_size);
 	if (FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M < max_size) {
-		printf("NSP: default buffer too small for command 0x%04x\n",
-		       code);
-		printf("\t(%llu < %u)\n",
-		       FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M,
-		       max_size);
+		PMD_DRV_LOG(ERR, "NSP: default buffer too small for command 0x%04x (%llu < %u)",
+			    code, FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M, max_size);
 		return -EINVAL;
 	}
 
@@ -372,7 +370,7 @@ nfp_nsp_wait(struct nfp_nsp *state)
 		}
 	}
 	if (err)
-		printf("NSP failed to respond %d\n", err);
+		PMD_DRV_LOG(ERR, "NSP failed to respond %d", err);
 
 	return err;
 }
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
index bfd1eddb3e..1de3d1b00f 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
@@ -6,6 +6,7 @@
 #include <stdio.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp_nffw.h"
 
@@ -39,8 +40,7 @@ __nfp_nsp_identify(struct nfp_nsp *nsp)
 	memset(ni, 0, sizeof(*ni));
 	ret = nfp_nsp_read_identify(nsp, ni, sizeof(*ni));
 	if (ret < 0) {
-		printf("reading bsp version failed %d\n",
-			ret);
+		PMD_DRV_LOG(ERR, "reading bsp version failed %d", ret);
 		goto exit_free;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
index f8f3c372ac..eb532e5f3a 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
@@ -7,6 +7,7 @@
 #include <rte_common.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp6000/nfp6000.h"
 
@@ -236,7 +237,7 @@ nfp_eth_calc_port_geometry(struct nfp_eth_table *table)
 				continue;
 			if (table->ports[i].label_subport ==
 			    table->ports[j].label_subport)
-				printf("Port %d subport %d is a duplicate\n",
+				PMD_DRV_LOG(DEBUG, "Port %d subport %d is a duplicate",
 					 table->ports[i].label_port,
 					 table->ports[i].label_subport);
 
@@ -275,7 +276,7 @@ __nfp_eth_read_ports(struct nfp_nsp *nsp)
 	memset(entries, 0, NSP_ETH_TABLE_SIZE);
 	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
 	if (ret < 0) {
-		printf("reading port table failed %d\n", ret);
+		PMD_DRV_LOG(ERR, "reading port table failed %d", ret);
 		goto err;
 	}
 
@@ -294,7 +295,7 @@ __nfp_eth_read_ports(struct nfp_nsp *nsp)
 	 * above.
 	 */
 	if (ret && ret != cnt) {
-		printf("table entry count (%d) unmatch entries present (%d)\n",
+		PMD_DRV_LOG(ERR, "table entry count (%d) unmatch entries present (%d)",
 		       ret, cnt);
 		goto err;
 	}
@@ -372,12 +373,12 @@ nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
 
 	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
 	if (ret < 0) {
-		printf("reading port table failed %d\n", ret);
+		PMD_DRV_LOG(ERR, "reading port table failed %d", ret);
 		goto err;
 	}
 
 	if (!(entries[idx].port & NSP_ETH_PORT_LANES_MASK)) {
-		printf("trying to set port state on disabled port %d\n", idx);
+		PMD_DRV_LOG(ERR, "trying to set port state on disabled port %d", idx);
 		goto err;
 	}
 
@@ -535,7 +536,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
 	 *	 codes were initially not populated correctly.
 	 */
 	if (nfp_nsp_get_abi_ver_minor(nsp) < 17) {
-		printf("set operations not supported, please update flash\n");
+		PMD_DRV_LOG(ERR, "set operations not supported, please update flash");
 		return -EOPNOTSUPP;
 	}
 
@@ -647,8 +648,7 @@ __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
 
 	rate = nfp_eth_speed2rate(speed);
 	if (rate == RATE_INVALID) {
-		printf("could not find matching lane rate for speed %u\n",
-			 speed);
+		PMD_DRV_LOG(ERR, "could not find matching lane rate for speed %u", speed);
 		return -EINVAL;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.c b/drivers/net/nfp/nfpcore/nfp_resource.c
index 7b5630fd86..6a10c9b0a7 100644
--- a/drivers/net/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/nfp/nfpcore/nfp_resource.c
@@ -10,6 +10,7 @@
 #include <rte_string_fns.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp_resource.h"
 #include "nfp_crc.h"
@@ -79,7 +80,7 @@ nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res)
 
 	/* Search for a matching entry */
 	if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
-		printf("Grabbing device lock not supported\n");
+		PMD_DRV_LOG(ERR, "Grabbing device lock not supported");
 		return -EOPNOTSUPP;
 	}
 	key = nfp_crc32_posix(name_pad, NFP_RESOURCE_ENTRY_NAME_SZ);
@@ -185,7 +186,7 @@ nfp_resource_acquire(struct nfp_cpp *cpp, const char *name)
 			goto err_free;
 
 		if (count++ > 1000) {
-			printf("Error: resource %s timed out\n", name);
+			PMD_DRV_LOG(ERR, "Error: resource %s timed out", name);
 			err = -EBUSY;
 			goto err_free;
 		}
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.c b/drivers/net/nfp/nfpcore/nfp_rtsym.c
index 56bbf05cd8..288a37da60 100644
--- a/drivers/net/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.c
@@ -11,6 +11,7 @@
 #include <stdio.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_mip.h"
 #include "nfp_rtsym.h"
 #include "nfp6000/nfp6000.h"
@@ -56,11 +57,8 @@ nfp_rtsym_sw_entry_init(struct nfp_rtsym_table *cache, uint32_t strtab_size,
 	sw->size = ((uint64_t)fw->size_hi << 32) |
 		   rte_le_to_cpu_32(fw->size_lo);
 
-#ifdef DEBUG
-	printf("rtsym_entry_init\n");
-	printf("\tname=%s, addr=%" PRIx64 ", size=%" PRIu64 ",target=%d\n",
-		sw->name, sw->addr, sw->size, sw->target);
-#endif
+	PMD_INIT_LOG(DEBUG, "rtsym_entry_init name=%s, addr=%" PRIx64 ", size=%" PRIu64 ", target=%d",
+		     sw->name, sw->addr, sw->size, sw->target);
 	switch (fw->target) {
 	case SYM_TGT_LMEM:
 		sw->target = NFP_RTSYM_TARGET_LMEM;
@@ -241,10 +239,8 @@ nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
 
 	id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
 
-#ifdef DEBUG
-	printf("Reading symbol %s with size %" PRIu64 " at %" PRIx64 "\n",
+	PMD_DRV_LOG(DEBUG, "Reading symbol %s with size %" PRIu64 " at %" PRIx64 "",
 		name, sym->size, sym->addr);
-#endif
 	switch (sym->size) {
 	case 4:
 		err = nfp_cpp_readl(rtbl->cpp, id, sym->addr, &val32);
@@ -254,7 +250,7 @@ nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
 		err = nfp_cpp_readq(rtbl->cpp, id, sym->addr, &val);
 		break;
 	default:
-		printf("rtsym '%s' unsupported size: %" PRId64 "\n",
+		PMD_DRV_LOG(ERR, "rtsym '%s' unsupported size: %" PRId64,
 			name, sym->size);
 		err = -EINVAL;
 		break;
@@ -279,17 +275,15 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
 	const struct nfp_rtsym *sym;
 	uint8_t *mem;
 
-#ifdef DEBUG
-	printf("mapping symbol %s\n", name);
-#endif
+	PMD_DRV_LOG(DEBUG, "mapping symbol %s", name);
 	sym = nfp_rtsym_lookup(rtbl, name);
 	if (!sym) {
-		printf("symbol lookup fails for %s\n", name);
+		PMD_DRV_LOG(ERR, "symbol lookup fails for %s", name);
 		return NULL;
 	}
 
 	if (sym->size < min_size) {
-		printf("Symbol %s too small (%" PRIu64 " < %u)\n", name,
+		PMD_DRV_LOG(ERR, "Symbol %s too small (%" PRIu64 " < %u)", name,
 			sym->size, min_size);
 		return NULL;
 	}
@@ -297,12 +291,10 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
 	mem = nfp_cpp_map_area(rtbl->cpp, sym->domain, sym->target, sym->addr,
 			       sym->size, area);
 	if (!mem) {
-		printf("Failed to map symbol %s\n", name);
+		PMD_DRV_LOG(ERR, "Failed to map symbol %s", name);
 		return NULL;
 	}
-#ifdef DEBUG
-	printf("symbol %s with address %p\n", name, mem);
-#endif
+	PMD_DRV_LOG(DEBUG, "symbol %s with address %p", name, mem);
 
 	return mem;
 }
-- 
2.29.3


^ permalink raw reply	[relevance 8%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-03  9:44  0%                       ` Jerin Jacob
@ 2023-02-06  6:21  0%                         ` Naga Harish K, S V
  2023-02-06 16:38  0%                           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-02-06  6:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, February 3, 2023 3:15 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, January 30, 2023 8:13 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > >
> > > > > > > > > >
> > > > > > > > > > > > +        */
> > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > >
> > > > > > > > > > > Introduce
> > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > to
> > > > > > > make
> > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The reserved fields are not used by the adapter or
> application.
> > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > clear reserved
> > > > > fields.
> > > > > > > > >
> > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > fieds in future), the old applicaiton which is not using
> > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> > > junk
> > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > does it mean, the application doesn't re-compile for the new
> DPDK?
> > > > > > >
> > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > >
> > > > > > > > When some of the reserved fields are used in the future,
> > > > > > > > the application
> > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > As the application also may need to use the newly consumed
> > > > > > > > reserved
> > > > > > > fields?
> > > > > > >
> > > > > > > The problematic case is:
> > > > > > >
> > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > params) field needs to work with application of 23.03.
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > > > >
> > > > > >
> > > > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes
> > > > > > only
> > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > >
> > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > > > fields, not just reserved field.
> > > > > The application calling sequence  is
> > > > >
> > > > > struct my_config c;
> > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > c.interseted_filed_to_be_updated = val;
> > > > >
> > > > Can it be done like
> > > >         struct my_config c = {0};
> > > >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > > > comments to recommend above usage to reset all fields?
> > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> > > avoided.
> > >
> > > Better to have a function for documentation clarity. Similar scheme
> > > already there in DPDK. See rte_eth_cman_config_init()
> > >
> > >
> >
> >
> > The reference function rte_eth_cman_config_init() is resetting the params
> struct and initializing the required params with default values in the pmd cb.
> 
> No need for PMD cb.
> 
> > The proposed rte_event_eth_rx_adapter_runtime_params_init () API just
> needs to reset the params struct. There are no pmd CBs involved.
> > Having an API just to reset the struct seems overkill. What do you think?
> 
> It is slow path API. Keeping it as function is better. Also, it helps the
> documentations of config parm in
> rte_event_eth_rx_adapter_runtime_params_config()
> like, This structure must be initialized with
> rte_event_eth_rx_adapter_runtime_params_init() or so.
> 
> 

Are there any other reasons to have this API (*params_init()) other than documentation?

> 
> >
> > > >
> > > > > Let me share an example and you can tell where is the issue
> > > > >
> > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > > > 3)There is an application written based on 22.03 which using
> > > > > only 8B after calling
> > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > application is calling
> > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > >
> > > > > > The old application only tries to set/get previous valid
> > > > > > fields and the newly
> > > > > used fields may still contain junk value.
> > > > > > If the application wants to make use of any the newly used
> > > > > > params, the
> > > > > application changes are required anyway.
> > > > >
> > > > > Yes. If application wants to make use of newly added features.
> > > > > No need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:49  0%         ` Thomas Monjalon
@ 2023-02-05 23:41  0%           ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-05 23:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

On Fri, 03 Feb 2023 21:49:02 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> > > > > Good catch.
> > > > > By the way, we should remove unused RTE_LOGTYPE_*.    
> > > > 
> > > > Yes, for 23.11 would like to work down the list.    
> > > 
> > > Do we need to wait 23.11?
> > > It is not an ABI breakage.
> > > And most of these defines are already unused.  
> > 
> > Turning them into deprecated would be API breakage though  
> 
> API breakage is not forbidden.
> 

For the internal ones it would be ok, but what about the RTE_LOGTYPE_USER1 etc.
These need to go through the regular deprecation process.

The problem is that if the the types are not registered (see eal_common_log.c)
they might get reused.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 9/9] telemetry: change public API to use 64-bit signed values
  @ 2023-02-05 22:55  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-02-05 22:55 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, Morten Brørup, Tyler Retzlaff, Ciara Power, david.marchand

12/01/2023 18:41, Bruce Richardson:
> While the unsigned values added to telemetry dicts/arrays were up to
> 64-bits in size, the sized values were only up to 32-bits. We can

sized -> signed

> standardize the API by having both int and uint functions take 64-bit
> values. For ABI compatibility, we use function versioning to ensure
> older binaries can still use the older functions taking a 32-bit
> parameter.
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
[...]
> --- a/lib/telemetry/version.map
> +++ b/lib/telemetry/version.map
> +DPDK_24 {
> +	global:
> +
> +	rte_tel_data_add_array_int;
> +	rte_tel_data_add_dict_int;
> +} DPDK_23;

For the record, these are the versioned symbols.



^ permalink raw reply	[relevance 0%]

* RE: Sign changes through function signatures
  2023-02-03 22:12  0%         ` Tyler Retzlaff
@ 2023-02-04  8:09  0%           ` Morten Brørup
  2023-02-06 15:57  3%             ` Ben Magistro
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-04  8:09 UTC (permalink / raw)
  To: Tyler Retzlaff, Bruce Richardson
  Cc: Thomas Monjalon, Ben Magistro, Olivier Matz, ferruh.yigit,
	andrew.rybchenko, ben.magistro, dev, Stefan Baranoff,
	david.marchand, anatoly.burakov

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 3 February 2023 23.13
> 
> On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 2 February 2023 21.45
> > > >
> > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > Hello,
> > > > > >
> > > > > > While making some updates to our code base for 22.11.1 that
> were
> > > > missed in
> > > > > > our first pass through, we hit the numa node change[1].  In
> the
> > > > process of
> > > > > > updating our code, we noticed that a couple functions
> > > > (rx/tx_queue_setup,
> > > > > > maybe more that we aren't using) state they accept
> `SOCKET_ID_ANY`
> > > > but the
> > > > > > function signature then asks for an unsigned integer while
> > > > `SOCKET_ID_ANY`
> > > > > > is `-1`.  Following it through the redirect to the "real"
> function
> > > > it also
> > > > > > asks for an unsigned integer which is then passed on to one
> or more
> > > > > > functions asking for an integer.  As an example using the the
> i40e
> > > > driver
> > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> ultimately
> > > > calls
> > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > `rte_zmalloc_socket`[4]
> > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > >
> > > > > > I guess what I am looking for is clarification on if this is
> > > > intentional or
> > > > > > if this is additional cleanup that may need to be
> completed/be
> > > > desirable so
> > > > > > that signs are maintained through the call paths and avoid
> > > > potentially
> > > > > > producing sign-conversion warnings.  From the very quick
> glance I
> > > > took at
> > > > > > the i40e driver, it seems these are just passed through to
> other
> > > > functions
> > > > > > and no direct use/manipulation occurs (at least in the
> mentioned
> > > > functions).
> > > > >
> > > > > i believe this is just sloppyness with sign in our api surface.
> i too
> > > > > find it frustrating that use of these api force either explicit
> > > > > casts or suffer having to suppress warnings.
> > > > >
> > > > > in the past examples of this have been cleaned up without full
> > > > deprecation
> > > > > notices but there are a lot of instances. i also feel
> (unpopular
> > > > opinion)
> > > > > that for some integer types like this that have constrained
> range /
> > > > number
> > > > > spaces it would be of value to introduce a typedef that can be
> used
> > > > > consistently.
> > > > >
> > > > > for now you'll just have to add the casts and hopefully in the
> future
> > > > we
> > > > > will fix the api making them unnecessary. of course feel free
> to
> > > > submit
> > > > > patches too, it would be great to have these cleaned up.
> > > >
> > > > I agree it should be cleaned up.
> > > > Those IDs should accept negative values.
> > > > Not sure which type we should choose (int, int32_t, or a
> typedef).
> > >
> > > Why would we use a signed socket ID? We don't use signed port IDs.
> To me, unsigned seems the way to go. (A minor detail: With unsigned we
> can use the entire range of values minus one (for the magic "any"
> value), whereas with signed we can only use the positive range of
> values. This detail is completely irrelevant when using 32 bit for
> socket ID, but could be relevant if using fewer bits.)
> > >
> > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> suffice, like port ID. But reducing from 32 bit would probably cause
> major ABI breakage.
> > >
> > > >
> > > > Another thing to check is the name of the variable.
> > > > It should be a socket ID when talking about CPU,
> > > > and a NUMA node ID when talking about memory.
> > > >
> > > > And last but not the least,
> > > > how can we keep ABI compatibility?
> > > > I hope we can use function versioning to avoid deprecation and
> > > > breaking.
> > > >
> > > > Trials and suggestions are welcome.
> > >
> > > Signedness is not the only problem with the socket ID. The meaning
> of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> up, we should consider the need for another magic value SOCKET_ID_NONE
> for devices connected to the chipset, as discussed in this other email
> thread [1]. And as discussed there, there are also size problems,
> because some device structures use 8 bit to hold the socket ID.
> > >
> > > And functions should always return -1, never SOCKET_ID_ANY, to
> indicate error.
> > >
> > > [1]:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> erver.smartshare.dk/
> > >
> > > I only bring warnings and complications to the discussion here, no
> solutions. Sorry! :-(
> > >
> >
> > Personally, I think if we are going to change things, we should do
> things
> > properly, especially/even if we are going to have to break ABI or use
> ABI
> > compatibility.
> >
> > I would suggest rather than a typedef, we should actually wrap the
> int
> > value in a struct - for two reasons:
> 
> >
> > * it means the compiler will actually error out for us if an int or
> >   unsigned int is used instead. This allow easier fixing at compile-
> time
> >   rather than hoping things are correctly specified in existing code.
> >
> > * it allows us to do things like explicitly calling out flags, rather
> than
> >   just using magic values. While still keeping the size 32 bits, we
> can
> >   have the actual socket value as 16-bits and have flags to indicate:
> >   - ANY socket, NO socket, INVALID value socket. This could end up
> being
> >   useful in many cases, for example, when allocating memory we could
> >   specify a socket number with the ANY flag, indicating that any
> socket is
> >   ok, but we'd ideally prefer the number specified.
> 
> i'm a fan of this where it makes sense. i did this with rte_thread_t
> for
> exactly your first reason. but i did receive resistance from other
> members of the community. personally i like compilation to fail when i
> make a mistake.
> 
> it's definitely way easier to make the argument to do this when the
> actual valued is opaque. if it isn't i think then we need to provide
> macro/inline accessors to allow applications do whatever it is they do
> with the value they carry.
> 
> i'll also note that this allows you a cheap way to sprinkle extra
> integrity checking when running functional tests. if you have low
> performance inline accessors you can do things like enforce the range
> of
> values or or that enumerations are part of a set for debug builds.
> 
> as a side i would also caution while i suggested a typedef i don't mean
> that everything should be typedef'd especially actual structs that are
> used like structs. typedefs for things like socket id would
> unquestionably convey more information and implied semantics to the
> user
> of an api than just a standard `int' or whatever. consequently i have
> found
> that this lowers mistakes with the use of the api.

Hiding the socket_id in a typedef'd structure seems like shooting sparrows with cannons.

DPDK is using a C coding style, where there is a convention for not using typedefs:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs

In the tread case, a typedef made sense, because the underlying type can differ across O/S'es, and thus should be opaque. Which is in line with the coding style.

But I don't think this is the case for socket_id. The socket_id is an enumeration type, and all we need is a magic number for the "chipset" pseudo-socket. And with that, perhaps some iterator macros to include/omit this pseudo-socket, like the lcore_id iterators with and without the main lcore.

The mix of signed and unsigned in function signatures (and in the definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be present in other function signatures; we just happened to run into it for the socket_id.

The compiler has flags to warn about mixing signed and unsigned types, so we could use that flag to reveal and fix those bugs.

> 
> >
> > As for socket id, and numa id, I'm not sure we should have different
> > names/types for the two. For example, for PCI devices, do they need a
> third
> > type or are they associated with cores or with memory? The socket id
> for
> > the core only matters in terms of data locality, i.e. what memory or
> cache
> > location it is in. Therefore, for me, I'd pick one name and stick
> with it.
> 
> i think the choice for more than one type vs one type is whether or not
> they are "the same" number space as opposed to just coincidentally
> overlapping number spaces.
> 
> >
> > /Bruce


^ permalink raw reply	[relevance 0%]

* Re: Sign changes through function signatures
  2023-02-03 12:05  4%       ` Bruce Richardson
@ 2023-02-03 22:12  0%         ` Tyler Retzlaff
  2023-02-04  8:09  0%           ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-03 22:12 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Morten Brørup, Thomas Monjalon, Ben Magistro, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 2 February 2023 21.45
> > > 
> > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > Hello,
> > > > >
> > > > > While making some updates to our code base for 22.11.1 that were
> > > missed in
> > > > > our first pass through, we hit the numa node change[1].  In the
> > > process of
> > > > > updating our code, we noticed that a couple functions
> > > (rx/tx_queue_setup,
> > > > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> > > but the
> > > > > function signature then asks for an unsigned integer while
> > > `SOCKET_ID_ANY`
> > > > > is `-1`.  Following it through the redirect to the "real" function
> > > it also
> > > > > asks for an unsigned integer which is then passed on to one or more
> > > > > functions asking for an integer.  As an example using the the i40e
> > > driver
> > > > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> > > calls
> > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > `rte_zmalloc_socket`[4]
> > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > >
> > > > > I guess what I am looking for is clarification on if this is
> > > intentional or
> > > > > if this is additional cleanup that may need to be completed/be
> > > desirable so
> > > > > that signs are maintained through the call paths and avoid
> > > potentially
> > > > > producing sign-conversion warnings.  From the very quick glance I
> > > took at
> > > > > the i40e driver, it seems these are just passed through to other
> > > functions
> > > > > and no direct use/manipulation occurs (at least in the mentioned
> > > functions).
> > > >
> > > > i believe this is just sloppyness with sign in our api surface. i too
> > > > find it frustrating that use of these api force either explicit
> > > > casts or suffer having to suppress warnings.
> > > >
> > > > in the past examples of this have been cleaned up without full
> > > deprecation
> > > > notices but there are a lot of instances. i also feel (unpopular
> > > opinion)
> > > > that for some integer types like this that have constrained range /
> > > number
> > > > spaces it would be of value to introduce a typedef that can be used
> > > > consistently.
> > > >
> > > > for now you'll just have to add the casts and hopefully in the future
> > > we
> > > > will fix the api making them unnecessary. of course feel free to
> > > submit
> > > > patches too, it would be great to have these cleaned up.
> > > 
> > > I agree it should be cleaned up.
> > > Those IDs should accept negative values.
> > > Not sure which type we should choose (int, int32_t, or a typedef).
> > 
> > Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)
> > 
> > Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.
> > 
> > > 
> > > Another thing to check is the name of the variable.
> > > It should be a socket ID when talking about CPU,
> > > and a NUMA node ID when talking about memory.
> > > 
> > > And last but not the least,
> > > how can we keep ABI compatibility?
> > > I hope we can use function versioning to avoid deprecation and
> > > breaking.
> > > 
> > > Trials and suggestions are welcome.
> > 
> > Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.
> > 
> > And functions should always return -1, never SOCKET_ID_ANY, to indicate error.
> > 
> > [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/
> > 
> > I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(
> >
> 
> Personally, I think if we are going to change things, we should do things
> properly, especially/even if we are going to have to break ABI or use ABI
> compatibility.
> 
> I would suggest rather than a typedef, we should actually wrap the int
> value in a struct - for two reasons:

> 
> * it means the compiler will actually error out for us if an int or
>   unsigned int is used instead. This allow easier fixing at compile-time
>   rather than hoping things are correctly specified in existing code.
> 
> * it allows us to do things like explicitly calling out flags, rather than
>   just using magic values. While still keeping the size 32 bits, we can
>   have the actual socket value as 16-bits and have flags to indicate:
>   - ANY socket, NO socket, INVALID value socket. This could end up being
>   useful in many cases, for example, when allocating memory we could
>   specify a socket number with the ANY flag, indicating that any socket is
>   ok, but we'd ideally prefer the number specified.

i'm a fan of this where it makes sense. i did this with rte_thread_t for
exactly your first reason. but i did receive resistance from other
members of the community. personally i like compilation to fail when i
make a mistake.

it's definitely way easier to make the argument to do this when the
actual valued is opaque. if it isn't i think then we need to provide
macro/inline accessors to allow applications do whatever it is they do
with the value they carry.

i'll also note that this allows you a cheap way to sprinkle extra
integrity checking when running functional tests. if you have low
performance inline accessors you can do things like enforce the range of
values or or that enumerations are part of a set for debug builds.

as a side i would also caution while i suggested a typedef i don't mean
that everything should be typedef'd especially actual structs that are
used like structs. typedefs for things like socket id would
unquestionably convey more information and implied semantics to the user
of an api than just a standard `int' or whatever. consequently i have found
that this lowers mistakes with the use of the api.

> 
> As for socket id, and numa id, I'm not sure we should have different
> names/types for the two. For example, for PCI devices, do they need a third
> type or are they associated with cores or with memory? The socket id for
> the core only matters in terms of data locality, i.e. what memory or cache
> location it is in. Therefore, for me, I'd pick one name and stick with it.

i think the choice for more than one type vs one type is whether or not
they are "the same" number space as opposed to just coincidentally
overlapping number spaces.

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:26  0%       ` Stephen Hemminger
@ 2023-02-03 20:49  0%         ` Thomas Monjalon
  2023-02-05 23:41  0%           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-02-03 20:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

03/02/2023 21:26, Stephen Hemminger:
> On Fri, 03 Feb 2023 21:18:40 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 03/02/2023 18:33, Stephen Hemminger:
> > > On Fri, 03 Feb 2023 09:42:45 +0100
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > >   
> > > > 03/02/2023 01:25, Stephen Hemminger:  
> > > > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > > > >     
> > > > > > --- a/lib/eal/include/rte_log.h
> > > > > > +++ b/lib/eal/include/rte_log.h
> > > > > > @@ -48,6 +48,7 @@ extern "C" {
> > > > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */    
> > > > > 
> > > > > NAK to this part.
> > > > > No new static logtypes please.    
> > > > 
> > > > Good catch.
> > > > By the way, we should remove unused RTE_LOGTYPE_*.  
> > > 
> > > Yes, for 23.11 would like to work down the list.  
> > 
> > Do we need to wait 23.11?
> > It is not an ABI breakage.
> > And most of these defines are already unused.
> 
> Turning them into deprecated would be API breakage though

API breakage is not forbidden.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-03 12:19  0%             ` Bruce Richardson
@ 2023-02-03 20:49  0%               ` Tyler Retzlaff
  2023-02-07 15:16  0%                 ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-03 20:49 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Morten Brørup, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Wednesday, 1 February 2023 22.41
> > > > 
> > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> > > > >
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > >
> > > > > > Honnappa, please could you give your view on the future of atomics
> > > > in DPDK?
> > > > > Thanks Thomas, apologies it has taken me a while to get to this
> > > > discussion.
> > > > >
> > > > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > > > (stdatomics as is called here) already serve the purpose. These APIs
> > > > are well understood and documented.
> > > > 
> > > > i agree that whatever atomics APIs we advocate for should align with
> > > > the
> > > > standard C atomics for the reasons you state including implied
> > > > semantics.
> > > > 
> > > > >
> > > > > For environments where stdatomics are not supported, we could have a
> > > > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > > > _explicit APIs). This allows the code to use stdatomics APIs and when
> > > > we move to minimum supported standard C11, we just need to get rid of
> > > > the file in DPDK repo.
> > > 
> > > Perhaps we can use something already existing, such as this:
> > > https://android.googlesource.com/platform/bionic/+/lollipop-release/libc/include/stdatomic.h
> > > 
> > > > 
> > > > my concern with this is that if we provide a stdatomic.h or introduce
> > > > names
> > > > from stdatomic.h it's a violation of the C standard.
> > > > 
> > > > references:
> > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > >  * GNU libc manual
> > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > Names.html
> > > > 
> > > > in effect the header, the names and in some instances namespaces
> > > > introduced
> > > > are reserved by the implementation. there are several reasons in the
> > > > GNU libc
> > > > manual that explain the justification for these reservations and if
> > > > if we think about ODR and ABI compatibility we can conceive of others.
> > > 
> > > I we are going to move to C11 soon, I consider the shim interim, and am inclined to ignore these warning factors.
> > > 
> > > If we are not moving to C11 soon, I would consider these disadvantages more seriously.
> > 
> > I think it's reasonable to assume that we are talking years here.
> > 
> > We've had a few discussions about minimum C standard. I think my first
> > mailing list exchanges about C99 was almost 2 years ago. Given that we
> > still aren't on C99 now (though i know Bruce has a series up) indicates
> > that progression to C11 isn't going to happen any time soon and even if
> > it was the baseline we still can't just use it (reasons described
> > later).
> > 
> > Also, i'll point out that we seem to have accepted moving to C99 with
> > one of the holdback compilers technically being non-conformant but it
> > isn't blocking us because it provides the subset of C99 features without
> > being conforming that we happen to be using.
> > 
> What compiler is this? As far as I know, all our currently support
> compilers claim to support C99 fully. All should support C11 also,
> except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos 7, I
> think we can require at minimum a c11 compiler for building DPDK itself.
> I'm still a little uncertain about requiring that users build their own
> code with -std=c11, though.

perhaps i'm mistaken but it was my understanding that the gcc version on
RHEL 7 did not fully conform to C99? maybe i read C99 when it was actually
C11.

regardless, even if every supported compiler for dpdk was C11 conformant
including stdatomics which are optional we can't just move from
intrinsic/builtins to standard C atomics (because of the compatibility
and performance issues mentioned previously).

so just re-orienting this discussion, the purpose of this abstraction is
to allow the optional use of standard C atomics when a conformant compiler
is available and satisfactory code is generated for the desired target.

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:18  2%     ` Thomas Monjalon
@ 2023-02-03 20:26  0%       ` Stephen Hemminger
  2023-02-03 20:49  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-03 20:26 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

On Fri, 03 Feb 2023 21:18:40 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> 03/02/2023 18:33, Stephen Hemminger:
> > On Fri, 03 Feb 2023 09:42:45 +0100
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > 03/02/2023 01:25, Stephen Hemminger:  
> > > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > > >     
> > > > > --- a/lib/eal/include/rte_log.h
> > > > > +++ b/lib/eal/include/rte_log.h
> > > > > @@ -48,6 +48,7 @@ extern "C" {
> > > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */    
> > > > 
> > > > NAK to this part.
> > > > No new static logtypes please.    
> > > 
> > > Good catch.
> > > By the way, we should remove unused RTE_LOGTYPE_*.  
> > 
> > Yes, for 23.11 would like to work down the list.  
> 
> Do we need to wait 23.11?
> It is not an ABI breakage.
> And most of these defines are already unused.

Turning them into deprecated would be API breakage though


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  @ 2023-02-03 20:18  2%     ` Thomas Monjalon
  2023-02-03 20:26  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-02-03 20:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

03/02/2023 18:33, Stephen Hemminger:
> On Fri, 03 Feb 2023 09:42:45 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 03/02/2023 01:25, Stephen Hemminger:
> > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > >   
> > > > --- a/lib/eal/include/rte_log.h
> > > > +++ b/lib/eal/include/rte_log.h
> > > > @@ -48,6 +48,7 @@ extern "C" {
> > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */  
> > > 
> > > NAK to this part.
> > > No new static logtypes please.  
> > 
> > Good catch.
> > By the way, we should remove unused RTE_LOGTYPE_*.
> 
> Yes, for 23.11 would like to work down the list.

Do we need to wait 23.11?
It is not an ABI breakage.
And most of these defines are already unused.



^ permalink raw reply	[relevance 2%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-02 20:44  0%             ` Morten Brørup
@ 2023-02-03 13:56  0%               ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-02-03 13:56 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Tyler Retzlaff, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Thu, Feb 02, 2023 at 09:44:52PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Thursday, 2 February 2023 20.00
> > 
> > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Wednesday, 1 February 2023 22.41
> > > >
> > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> > wrote:
> > > > >
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > >
> > > > > > Honnappa, please could you give your view on the future of
> > atomics
> > > > in DPDK?
> > > > > Thanks Thomas, apologies it has taken me a while to get to this
> > > > discussion.
> > > > >
> > > > > IMO, we do not need DPDK's own abstractions. APIs from
> > stdatomic.h
> > > > (stdatomics as is called here) already serve the purpose. These
> > APIs
> > > > are well understood and documented.
> > > >
> > > > i agree that whatever atomics APIs we advocate for should align
> > with
> > > > the
> > > > standard C atomics for the reasons you state including implied
> > > > semantics.
> > > >
> > > > >
> > > > > For environments where stdatomics are not supported, we could
> > have a
> > > > stdatomic.h in DPDK implementing the same APIs (we have to support
> > only
> > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > when
> > > > we move to minimum supported standard C11, we just need to get rid
> > of
> > > > the file in DPDK repo.
> > >
> > > Perhaps we can use something already existing, such as this:
> > > https://android.googlesource.com/platform/bionic/+/lollipop-
> > release/libc/include/stdatomic.h
> > >
> > > >
> > > > my concern with this is that if we provide a stdatomic.h or
> > introduce
> > > > names
> > > > from stdatomic.h it's a violation of the C standard.
> > > >
> > > > references:
> > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > >  * GNU libc manual
> > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > Names.html
> > > >
> > > > in effect the header, the names and in some instances namespaces
> > > > introduced
> > > > are reserved by the implementation. there are several reasons in
> > the
> > > > GNU libc
> > > > manual that explain the justification for these reservations and if
> > > > if we think about ODR and ABI compatibility we can conceive of
> > others.
> > >
> > > I we are going to move to C11 soon, I consider the shim interim, and
> > am inclined to ignore these warning factors.
> > >
> > > If we are not moving to C11 soon, I would consider these
> > disadvantages more seriously.
> > 
> > I think it's reasonable to assume that we are talking years here.
> > 
> > We've had a few discussions about minimum C standard. I think my first
> > mailing list exchanges about C99 was almost 2 years ago. Given that we
> > still aren't on C99 now (though i know Bruce has a series up) indicates
> > that progression to C11 isn't going to happen any time soon and even if
> > it was the baseline we still can't just use it (reasons described
> > later).
> > 
> > Also, i'll point out that we seem to have accepted moving to C99 with
> > one of the holdback compilers technically being non-conformant but it
> > isn't blocking us because it provides the subset of C99 features
> > without
> > being conforming that we happen to be using.
> > 
> > >
> > > >
> > > > i'll also remark that the inter-mingling of names from the POSIX
> > > > standard implicitly exposed as a part of the EAL public API has
> > been
> > > > problematic for portability.
> > >
> > > This is a very important remark, which should be considered
> > carefully! Tyler has firsthand experience with DPDK portability. If he
> > thinks porting to Windows is going to be a headache if we expose the
> > stdatomic.h API, we must listen! So, what is your gut feeling here,
> > Tyler?
> > 
> > I think this is even more of a concern with language standard than it
> > is
> > with a platform standard. Because the language standard is used across
> > platforms.
> > 
> > On the surface it looks appealing to just go through all the dpdk code
> > one last time and #include <stdatomic.h> and directly depend on names
> > that "look" standard. In practice though we aren't depending on the
> > toolchain / libc surface we are binding ourselves to the shim and the
> > implementation it provides.
> > 
> > This is aside from the mechanics of making it work in the different
> > contexts we now have to care about. Here is a story of how things
> > become tricky.
> > 
> > When i #include <stdatomic.h> which one gets used if the implementation
> > provides one? Let's force our stdatomic.h
> > 
> > Now i need to force the build system to prefer my shim header? Keeping
> > in mind that the presence of a libc stdatomic.h does not mean that the
> > toolchain in fact supports standard atomics. Okay, that's under our
> > control by editing some meson.build files maybe it isn't so bad but...
> > 
> > It seems my application also has to do the same in their build system
> > now because...
> > 
> > The type definitions (size, alignment) and code generated from the
> > body of inline functions as seen by the application built translation
> > units may differ from those in the dpdk translation units if they don't
> > use our header. The potential for ABI compat problems is increasing but
> > maybe it is managable? it can be worse...
> > 
> > We can't limit our scope to thinking that there is just an
> > application (a single binary) and dpdk. Complex applications will
> > invariably depend on other libraries and if the application needs to
> > interface with those compatibily at the ABI level using standard
> > atomics
> > then we've made it very difficult since the application has to choose
> > to
> > use our conflicting named atomic types which may not be compatible or
> > the real standard atomics.  They can of course produce horrible shims
> > of their own to interoperate.
> > 
> > We need consistency across the entire binary at runtime and i don't
> > think it's practical to say that anyone who uses dpdk has to compile
> > their whole world with our shim. So dealing with all this complexity
> > for the sake of asthetics "looking" like the standard api seems kind
> > of not worth it. Sure it saves having to deprecate later and one last
> > session of shotgun surgery but that's kind of all we get.
> > 
> > Don't think i'm being biased in favor of windows/msvc here. From the
> > perspective of the windows/msvc combination i intend to use only the
> > standard C ABI provided by the implementation. I have no intention of
> > trying to introduce support for the current ABI that doesn't use the
> > standard atomic types. my discouraging of this approach is about
> > avoiding
> > subtle to detect but very painful problems on
> > {linux,unix}/compiler<version>
> > combinations that already have a shipped/stable ABI.
> > 
> > > >
> > > > let's discuss this from here. if there's still overwhelming desire
> > to
> > > > go
> > > > this route then we'll just do our best.
> > > >
> > > > ty
> > >
> > > I have a preference for exposing the stdatomic.h API. Tyler listed
> > the disadvantages above. (I also have a preference for moving to C11
> > soon.)
> > 
> > I am eager to see this happen, but as explained in my original proposal
> > it doesn't eliminate the need for an abstraction. Unless we are willing
> > to break our compatibility promises and potentially take a performance
> > hit on some platform/compiler combinations which as i understand is not
> > acceptable.
> > 
> > >
> > > Exposing a 1:1 similar API with RTE prefixes would also be acceptable
> > for me. The disadvantage is that the names are different than the C11
> > names, which might lead to some confusion. And from an ABI stability
> > perspective, such an important API should not be marked experimental.
> > This means that years will pass before we can get rid of it again, due
> > to ABI stability policies.
> > 
> > I think the key to success with rte_ prefixed names is making
> > absolutely
> > sure we mirror the semantics and types in the standard.
> > 
> > I will point out one bit of fine print here is that we will not support
> > atomic operations on struct/union types (something the standard
> > supports).
> > With the rte_ namespace i think this becomes less ambiguous, if we
> > present
> > standard C names though what's to avoid the confusion? Aside from it
> > fails
> > to compile with one compiler vs another.
> > 
> > I agree that this may be around for years. But how many years depends a
> > lot on how long we have to maintain compatibility for the existing
> > platform/compiler combinations that can't (and aren't enabled) to use
> > the standard.
> > 
> > Even if we introduced standard names we still have to undergo some kind
> > of mutant deprecation process to get the world to recompile everything
> > against the actual standard, so it doesn't give us forward
> > compatibility.
> > 
> > Let me know what folks would like to do, i guess i'm firmly leaned
> > toward no-shim and just rte_ explicit. But as a community i'll pursue
> > whatever you decide.
> > 
> > Thanks!
> 
> Tyler is making a very strong case here.
> 
> I have changed my mind, and now support Tyler's approach.
> 

Having read through the whole thread a second time, I am starting to
realise how complex a problem this could be. From what I read, I would tend
towards the opinion that we shouldn't provide any atomics in DPDK at all,
and just rely on the C standard ones. The main complication in any solution
I suspect is going to be the use of atomics in static inline functions we
have in our header files.

/Bruce

^ permalink raw reply	[relevance 0%]

* [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  @ 2023-02-03 13:33  6%     ` Jiawei Wang
  2023-02-06 15:29  0%       ` Jiawei(Jonny) Wang
                         ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Jiawei Wang @ 2023-02-03 13:33 UTC (permalink / raw)
  To: viacheslavo, orika, thomas, andrew.rybchenko, Aman Singh,
	Yuying Zhang, Ferruh Yigit
  Cc: dev, rasland

When multiple physical ports are connected to a single DPDK port,
(example: kernel bonding, DPDK bonding, failsafe, etc.),
we want to know which physical port is used for Rx and Tx.

This patch maps a DPDK Tx queue with a physical port,
by adding tx_phy_affinity setting in Tx queue.
The affinity number is the physical port ID where packets will be
sent.
Value 0 means no affinity and traffic could be routed to any
connected physical ports, this is the default current behavior.

The number of physical ports is reported with rte_eth_dev_info_get().

The new tx_phy_affinity field is added into the padding hole of
rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
An ABI check rule needs to be added to avoid false warning.

Add the testpmd command line:
testpmd> port config (port_id) txq (queue_id) phy_affinity (value)

For example, there're two physical ports connected to
a single DPDK port (port id 0), and phy_affinity 1 stood for
the first physical port and phy_affinity 2 stood for the second
physical port.
Use the below commands to config tx phy affinity for per Tx Queue:
        port config 0 txq 0 phy_affinity 1
        port config 0 txq 1 phy_affinity 1
        port config 0 txq 2 phy_affinity 2
        port config 0 txq 3 phy_affinity 2

These commands config the Tx Queue index 0 and Tx Queue index 1 with
phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
these packets will be sent from the first physical port, and similar
with the second physical port if sending packets with Tx Queue 2
or Tx Queue 3.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
 app/test-pmd/config.c                       |   1 +
 devtools/libabigail.abignore                |   5 +
 doc/guides/rel_notes/release_23_03.rst      |   4 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
 lib/ethdev/rte_ethdev.h                     |  10 ++
 6 files changed, 133 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index cb8c174020..f771fcf8ac 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 
 			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
 			"    Cleanup txq mbufs for a specific Tx queue\n\n"
+
+			"port config (port_id) txq (queue_id) phy_affinity (value)\n"
+			"    Set the physical affinity value "
+			"on a specific Tx queue\n\n"
 		);
 	}
 
@@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
 	}
 };
 
+/* *** configure port txq phy_affinity value *** */
+struct cmd_config_tx_phy_affinity {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	portid_t portid;
+	cmdline_fixed_string_t txq;
+	uint16_t qid;
+	cmdline_fixed_string_t phy_affinity;
+	uint8_t value;
+};
+
+static void
+cmd_config_tx_phy_affinity_parsed(void *parsed_result,
+				  __rte_unused struct cmdline *cl,
+				  __rte_unused void *data)
+{
+	struct cmd_config_tx_phy_affinity *res = parsed_result;
+	struct rte_eth_dev_info dev_info;
+	struct rte_port *port;
+	int ret;
+
+	if (port_id_is_invalid(res->portid, ENABLED_WARN))
+		return;
+
+	if (res->portid == (portid_t)RTE_PORT_ALL) {
+		printf("Invalid port id\n");
+		return;
+	}
+
+	port = &ports[res->portid];
+
+	if (strcmp(res->txq, "txq")) {
+		printf("Unknown parameter\n");
+		return;
+	}
+	if (tx_queue_id_is_invalid(res->qid))
+		return;
+
+	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
+	if (ret != 0)
+		return;
+
+	if (dev_info.nb_phy_ports == 0) {
+		printf("Number of physical ports is 0 which is invalid for PHY Affinity\n");
+		return;
+	}
+	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
+	if (dev_info.nb_phy_ports < res->value) {
+		printf("The PHY affinity value %u is Invalid, exceeds the "
+		       "number of physical ports\n", res->value);
+		return;
+	}
+	port->txq[res->qid].conf.tx_phy_affinity = res->value;
+
+	cmd_reconfig_device_queue(res->portid, 0, 1);
+}
+
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 port, "port");
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 config, "config");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 portid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 txq, "txq");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      qid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 phy_affinity, "phy_affinity");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      value, RTE_UINT8);
+
+static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
+	.f = cmd_config_tx_phy_affinity_parsed,
+	.data = (void *)0,
+	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
+	.tokens = {
+		(void *)&cmd_config_tx_phy_affinity_port,
+		(void *)&cmd_config_tx_phy_affinity_config,
+		(void *)&cmd_config_tx_phy_affinity_portid,
+		(void *)&cmd_config_tx_phy_affinity_txq,
+		(void *)&cmd_config_tx_phy_affinity_qid,
+		(void *)&cmd_config_tx_phy_affinity_hwport,
+		(void *)&cmd_config_tx_phy_affinity_value,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
 	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
 	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
+	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
 	NULL,
 };
 
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index acccb6b035..b83fb17cfa 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
 		printf("unknown\n");
 		break;
 	}
+	printf("Current number of physical ports: %u\n", dev_info.nb_phy_ports);
 }
 
 void
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..ac7d3fb2da 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -34,3 +34,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; Ignore fields inserted in padding hole of rte_eth_txconf
+[suppress_type]
+        name = rte_eth_txconf
+        has_data_member_inserted_between = {offset_of(tx_deferred_start), offset_of(offloads)}
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 73f5d94e14..e99bd2dcb6 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added affinity for multiple physical ports connected to a single DPDK port.**
+
+  * Added Tx affinity in queue setup to map a physical port.
+
 * **Updated AMD axgbe driver.**
 
   * Added multi-process support.
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 79a1fa9cb7..5c716f7679 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
 
 This command should be run when the port is stopped, or else it will fail.
 
+config per queue Tx physical affinity
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configure a per queue physical affinity value only on a specific Tx queue::
+
+   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
+
+* ``phy_affinity``: physical port to use for sending,
+                    when multiple physical ports are connected to
+                    a single DPDK port.
+
+This command should be run when the port is stopped, otherwise it fails.
+
 Config VXLAN Encap outer layers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..2fd971b7b5 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
 				      less free descriptors than this value. */
 
 	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	/**
+	 * Affinity with one of the multiple physical ports connected to the DPDK port.
+	 * Value 0 means no affinity and traffic could be routed to any connected
+	 * physical port.
+	 * The first physical port is number 1 and so on.
+	 * Number of physical ports is reported by nb_phy_ports in rte_eth_dev_info.
+	 */
+	uint8_t tx_phy_affinity;
 	/**
 	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
 	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
@@ -1744,6 +1752,8 @@ struct rte_eth_dev_info {
 	/** Device redirection table size, the total number of entries. */
 	uint16_t reta_size;
 	uint8_t hash_key_size; /**< Hash key size in bytes */
+	/** Number of physical ports connected with DPDK port. */
+	uint8_t nb_phy_ports;
 	/** Bit mask of RSS offloads, the bit offset also means flow type */
 	uint64_t flow_type_rss_offloads;
 	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration */
-- 
2.18.1


^ permalink raw reply	[relevance 6%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-02 19:00  4%           ` Tyler Retzlaff
  2023-02-02 20:44  0%             ` Morten Brørup
@ 2023-02-03 12:19  0%             ` Bruce Richardson
  2023-02-03 20:49  0%               ` Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-03 12:19 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Morten Brørup, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Wednesday, 1 February 2023 22.41
> > > 
> > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> > > >
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > >
> > > > > Honnappa, please could you give your view on the future of atomics
> > > in DPDK?
> > > > Thanks Thomas, apologies it has taken me a while to get to this
> > > discussion.
> > > >
> > > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > > (stdatomics as is called here) already serve the purpose. These APIs
> > > are well understood and documented.
> > > 
> > > i agree that whatever atomics APIs we advocate for should align with
> > > the
> > > standard C atomics for the reasons you state including implied
> > > semantics.
> > > 
> > > >
> > > > For environments where stdatomics are not supported, we could have a
> > > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > > _explicit APIs). This allows the code to use stdatomics APIs and when
> > > we move to minimum supported standard C11, we just need to get rid of
> > > the file in DPDK repo.
> > 
> > Perhaps we can use something already existing, such as this:
> > https://android.googlesource.com/platform/bionic/+/lollipop-release/libc/include/stdatomic.h
> > 
> > > 
> > > my concern with this is that if we provide a stdatomic.h or introduce
> > > names
> > > from stdatomic.h it's a violation of the C standard.
> > > 
> > > references:
> > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > >  * GNU libc manual
> > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > Names.html
> > > 
> > > in effect the header, the names and in some instances namespaces
> > > introduced
> > > are reserved by the implementation. there are several reasons in the
> > > GNU libc
> > > manual that explain the justification for these reservations and if
> > > if we think about ODR and ABI compatibility we can conceive of others.
> > 
> > I we are going to move to C11 soon, I consider the shim interim, and am inclined to ignore these warning factors.
> > 
> > If we are not moving to C11 soon, I would consider these disadvantages more seriously.
> 
> I think it's reasonable to assume that we are talking years here.
> 
> We've had a few discussions about minimum C standard. I think my first
> mailing list exchanges about C99 was almost 2 years ago. Given that we
> still aren't on C99 now (though i know Bruce has a series up) indicates
> that progression to C11 isn't going to happen any time soon and even if
> it was the baseline we still can't just use it (reasons described
> later).
> 
> Also, i'll point out that we seem to have accepted moving to C99 with
> one of the holdback compilers technically being non-conformant but it
> isn't blocking us because it provides the subset of C99 features without
> being conforming that we happen to be using.
> 
What compiler is this? As far as I know, all our currently support
compilers claim to support C99 fully. All should support C11 also,
except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos 7, I
think we can require at minimum a c11 compiler for building DPDK itself.
I'm still a little uncertain about requiring that users build their own
code with -std=c11, though.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: Sign changes through function signatures
  2023-02-02 21:26  3%     ` Morten Brørup
@ 2023-02-03 12:05  4%       ` Bruce Richardson
  2023-02-03 22:12  0%         ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-03 12:05 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, Ben Magistro, Tyler Retzlaff, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 2 February 2023 21.45
> > 
> > 02/02/2023 21:26, Tyler Retzlaff:
> > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > Hello,
> > > >
> > > > While making some updates to our code base for 22.11.1 that were
> > missed in
> > > > our first pass through, we hit the numa node change[1].  In the
> > process of
> > > > updating our code, we noticed that a couple functions
> > (rx/tx_queue_setup,
> > > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> > but the
> > > > function signature then asks for an unsigned integer while
> > `SOCKET_ID_ANY`
> > > > is `-1`.  Following it through the redirect to the "real" function
> > it also
> > > > asks for an unsigned integer which is then passed on to one or more
> > > > functions asking for an integer.  As an example using the the i40e
> > driver
> > > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> > calls
> > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > `rte_zmalloc_socket`[4]
> > > > and `rte_eth_dma_zone_reserve`[5].
> > > >
> > > > I guess what I am looking for is clarification on if this is
> > intentional or
> > > > if this is additional cleanup that may need to be completed/be
> > desirable so
> > > > that signs are maintained through the call paths and avoid
> > potentially
> > > > producing sign-conversion warnings.  From the very quick glance I
> > took at
> > > > the i40e driver, it seems these are just passed through to other
> > functions
> > > > and no direct use/manipulation occurs (at least in the mentioned
> > functions).
> > >
> > > i believe this is just sloppyness with sign in our api surface. i too
> > > find it frustrating that use of these api force either explicit
> > > casts or suffer having to suppress warnings.
> > >
> > > in the past examples of this have been cleaned up without full
> > deprecation
> > > notices but there are a lot of instances. i also feel (unpopular
> > opinion)
> > > that for some integer types like this that have constrained range /
> > number
> > > spaces it would be of value to introduce a typedef that can be used
> > > consistently.
> > >
> > > for now you'll just have to add the casts and hopefully in the future
> > we
> > > will fix the api making them unnecessary. of course feel free to
> > submit
> > > patches too, it would be great to have these cleaned up.
> > 
> > I agree it should be cleaned up.
> > Those IDs should accept negative values.
> > Not sure which type we should choose (int, int32_t, or a typedef).
> 
> Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)
> 
> Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.
> 
> > 
> > Another thing to check is the name of the variable.
> > It should be a socket ID when talking about CPU,
> > and a NUMA node ID when talking about memory.
> > 
> > And last but not the least,
> > how can we keep ABI compatibility?
> > I hope we can use function versioning to avoid deprecation and
> > breaking.
> > 
> > Trials and suggestions are welcome.
> 
> Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.
> 
> And functions should always return -1, never SOCKET_ID_ANY, to indicate error.
> 
> [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/
> 
> I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(
>

Personally, I think if we are going to change things, we should do things
properly, especially/even if we are going to have to break ABI or use ABI
compatibility.

I would suggest rather than a typedef, we should actually wrap the int
value in a struct - for two reasons:

* it means the compiler will actually error out for us if an int or
  unsigned int is used instead. This allow easier fixing at compile-time
  rather than hoping things are correctly specified in existing code.

* it allows us to do things like explicitly calling out flags, rather than
  just using magic values. While still keeping the size 32 bits, we can
  have the actual socket value as 16-bits and have flags to indicate:
  - ANY socket, NO socket, INVALID value socket. This could end up being
  useful in many cases, for example, when allocating memory we could
  specify a socket number with the ANY flag, indicating that any socket is
  ok, but we'd ideally prefer the number specified.

As for socket id, and numa id, I'm not sure we should have different
names/types for the two. For example, for PCI devices, do they need a third
type or are they associated with cores or with memory? The socket id for
the core only matters in terms of data locality, i.e. what memory or cache
location it is in. Therefore, for me, I'd pick one name and stick with it.

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-02 16:12  0%                     ` Naga Harish K, S V
@ 2023-02-03  9:44  0%                       ` Jerin Jacob
  2023-02-06  6:21  0%                         ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-03  9:44 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, January 30, 2023 8:13 PM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > <jay.jayatheerthan@intel.com>
> > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > > >
> > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > >
> > > > > > > > >
> > > > > > > > > > > +        */
> > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > >
> > > > > > > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > to
> > > > > > make
> > > > > > > > > > sure rsvd is zero.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The reserved fields are not used by the adapter or application.
> > > > > > > > > Not sure Is it necessary to Introduce a new API to clear
> > > > > > > > > reserved
> > > > fields.
> > > > > > > >
> > > > > > > > When adapter starts using new fileds(when we add new fieds
> > > > > > > > in future), the old applicaiton which is not using
> > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> > junk
> > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > does it mean, the application doesn't re-compile for the new DPDK?
> > > > > >
> > > > > > Yes. No need recompile if ABI not breaking.
> > > > > >
> > > > > > > When some of the reserved fields are used in the future, the
> > > > > > > application
> > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > As the application also may need to use the newly consumed
> > > > > > > reserved
> > > > > > fields?
> > > > > >
> > > > > > The problematic case is:
> > > > > >
> > > > > > Adapter implementation of 23.07(Assuming there is change params)
> > > > > > field needs to work with application of 23.03.
> > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > > >
> > > > >
> > > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes only
> > > > reserved fields to zero,  it may not solve the issue in this case.
> > > >
> > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > > fields, not just reserved field.
> > > > The application calling sequence  is
> > > >
> > > > struct my_config c;
> > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > c.interseted_filed_to_be_updated = val;
> > > >
> > > Can it be done like
> > >         struct my_config c = {0};
> > >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > > comments to recommend above usage to reset all fields?
> > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> > avoided.
> >
> > Better to have a function for documentation clarity. Similar scheme already
> > there in DPDK. See rte_eth_cman_config_init()
> >
> >
>
>
> The reference function rte_eth_cman_config_init() is resetting the params struct and initializing the required params with default values in the pmd cb.

No need for PMD cb.

> The proposed rte_event_eth_rx_adapter_runtime_params_init () API just needs to reset the params struct. There are no pmd CBs involved.
> Having an API just to reset the struct seems overkill. What do you think?

It is slow path API. Keeping it as function is better. Also, it helps
the documentations of config parm in
rte_event_eth_rx_adapter_runtime_params_config()
like, This structure must be initialized with
rte_event_eth_rx_adapter_runtime_params_init() or so.



>
> > >
> > > > Let me share an example and you can tell where is the issue
> > > >
> > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > > 3)There is an application written based on 22.03 which using only 8B
> > > > after calling rte_event_eth_rx_adapter_runtime_params_init()
> > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > application is calling
> > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > and 9 to 15B are zero, the implementation will not go bad.
> > > >
> > > > > The old application only tries to set/get previous valid fields
> > > > > and the newly
> > > > used fields may still contain junk value.
> > > > > If the application wants to make use of any the newly used params,
> > > > > the
> > > > application changes are required anyway.
> > > >
> > > > Yes. If application wants to make use of newly added features. No
> > > > need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* [PATCH v5 2/3] graph: pcap capture for graph nodes
  2023-02-03  8:19  4%   ` [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
@ 2023-02-03  8:19  2%     ` Amit Prakash Shukla
  2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  1 sibling, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-03  8:19 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 216 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 116 +++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 ++
 9 files changed, 387 insertions(+), 3 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index dab62508c4..e709df8b53 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -78,6 +78,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -110,6 +114,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..9cbd1b8fdb
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <pwd.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..2ec772072c
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v5 1/3] pcapng: comment option support for epb
  2023-01-24 11:21  4% ` [PATCH v4 " Amit Prakash Shukla
  2023-01-24 11:21  2%   ` [PATCH v4 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
@ 2023-02-03  8:19  4%   ` Amit Prakash Shukla
  2023-02-03  8:19  2%     ` [PATCH v5 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
  2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  1 sibling, 2 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-03  8:19 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger; +Cc: dev, jerinj, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index a7acbdc058..303d3d66f9 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -139,7 +139,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -255,7 +255,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 73f5d94e14..dab62508c4 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -107,6 +107,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 80d08e1a3b..acb31a9d93 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -450,7 +450,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -511,6 +512,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -548,6 +552,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 7d2697c647..6d286cda41 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -100,6 +100,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -111,7 +113,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* RE: [EXT] Re: [PATCH v4 2/3] graph: pcap capture for graph nodes
  2023-01-31  8:06  0%     ` Jerin Jacob
@ 2023-02-03  8:15  0%       ` Amit Prakash Shukla
  0 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-03  8:15 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob Kollanukkaran, Kiran Kumar Kokkilagadda,
	Nithin Kumar Dabilpuram, Anatoly Burakov, dev

Thanks Jerin for the feedback. I will make the changes in next version of the patch.

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, January 31, 2023 1:37 PM
> To: Amit Prakash Shukla <amitprakashs@marvell.com>
> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Anatoly Burakov
> <anatoly.burakov@intel.com>; dev@dpdk.org
> Subject: [EXT] Re: [PATCH v4 2/3] graph: pcap capture for graph nodes
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Tue, Jan 24, 2023 at 4:52 PM Amit Prakash Shukla
> <amitprakashs@marvell.com> wrote:
> >
> > Implementation adds support to capture packets at each node with
> > packet metadata and node name.
> >
> > Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> > ---
> > v2:
> >  - Fixed code style issue
> >  - Fixed CI compilation issue on github-robot
> >
> > v3:
> >  - Code review suggestion from Stephen
> >  - Fixed potential memory leak
> >
> > v4:
> >  - Code review suggestion from Jerin
> >
> >  app/test/test_graph_perf.c             |   2 +-
> >  doc/guides/rel_notes/release_23_03.rst |   7 +
> >  lib/graph/graph.c                      |  17 +-
> >  lib/graph/graph_pcap.c                 | 217 +++++++++++++++++++++++++
> >  lib/graph/graph_pcap_private.h         | 124 ++++++++++++++
> >  lib/graph/graph_populate.c             |  12 +-
> >  lib/graph/graph_private.h              |   5 +
> >  lib/graph/meson.build                  |   3 +-
> >  lib/graph/rte_graph.h                  |   5 +
> >  lib/graph/rte_graph_worker.h           |   9 +
> >  10 files changed, 397 insertions(+), 4 deletions(-)  create mode
> > 100644 lib/graph/graph_pcap.c  create mode 100644
> > lib/graph/graph_pcap_private.h
> >
> > diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
> > index 1d065438a6..c5b463f700 100644
> > --- a/app/test/test_graph_perf.c
> > +++ b/app/test/test_graph_perf.c
> > @@ -324,7 +324,7 @@ graph_init(const char *gname, uint8_t nb_srcs,
> uint8_t nb_sinks,
> >         char nname[RTE_NODE_NAMESIZE / 2];
> >         struct test_node_data *node_data;
> >         char *ename[nodes_per_stage];
> > -       struct rte_graph_param gconf;
> > +       struct rte_graph_param gconf = {0};
> 
> If it is Fix move to seperate patch out this series.
> 
> 
> >         const struct rte_memzone *mz;
> >         uint8_t total_percent = 0;
> >         rte_node_t *src_nodes;
> > diff --git a/doc/guides/rel_notes/release_23_03.rst
> > b/doc/guides/rel_notes/release_23_03.rst
> > index 8c360b89e4..9ba392fb58 100644
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > @@ -69,6 +69,10 @@ New Features
> >      ``rte_event_dev_config::nb_single_link_event_port_queues``
> parameter
> >      required for eth_rx, eth_tx, crypto and timer eventdev adapters.
> >
> > +* **Added pcap trace support in graph library.**
> > +
> > +  * Added support to capture packets at each graph node with packet
> metadata and
> > +    node name.
> >
> >  Removed Items
> >  -------------
> > @@ -101,6 +105,9 @@ API Changes
> >  * Experimental function ``rte_pcapng_copy`` was updated to support
> comment
> >    section in enhanced packet block in pcapng library.
> >
> > +* Experimental structures ``struct rte_graph_param``, ``struct
> > +rte_graph`` and
> > +  ``struct graph`` were updated to support pcap trace in graph library.
> > +
> >  ABI Changes
> >  -----------
> >
> > diff --git a/lib/graph/graph.c b/lib/graph/graph.c index
> > 3a617cc369..a839a2803b 100644
> > --- a/lib/graph/graph.c
> > +++ b/lib/graph/graph.c
> > @@ -15,6 +15,7 @@
> >  #include <rte_string_fns.h>
> >
> >  #include "graph_private.h"
> > +#include "graph_pcap_private.h"
> >
> >  static struct graph_head graph_list =
> > STAILQ_HEAD_INITIALIZER(graph_list);
> >  static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER; @@
> > -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
> >                 node_db = node_from_name(name);
> >                 if (node_db == NULL)
> >                         SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
> > -               node->process = node_db->process;
> > +
> > +               if (graph->pcap_enable) {
> > +                       node->process = graph_pcap_dispatch;
> > +                       node->original_process = node_db->process;
> > +               } else
> > +                       node->process = node_db->process;
> >         }
> >
> >         return graph;
> > @@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph
> *graph)
> >         if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
> >                 return graph;
> >
> > +       if (graph_pcap_file_open(graph->pcap_filename) ||
> graph_pcap_mp_init())
> > +               graph_pcap_exit(graph);
> > +
> >         return graph_mem_fixup_node_ctx(graph);  }
> >
> > @@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct
> rte_graph_param *prm)
> >         if (graph_has_isolated_node(graph))
> >                 goto graph_cleanup;
> >
> > +       /* Initialize pcap config. */
> > +       graph_pcap_enable(prm->pcap_enable);
> > +
> >         /* Initialize graph object */
> >         graph->socket = prm->socket_id;
> >         graph->src_node_count = src_node_count;
> >         graph->node_count = graph_nodes_count(graph);
> >         graph->id = graph_id;
> > +       graph->num_pkt_to_capture = prm->num_pkt_to_capture;
> > +       if (prm->pcap_filename)
> > +               rte_strscpy(graph->pcap_filename, prm->pcap_filename,
> > + RTE_GRAPH_PCAP_FILE_SZ);
> >
> >         /* Allocate the Graph fast path memory and populate the data */
> >         if (graph_fp_mem_create(graph)) diff --git
> > a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c new file mode 100644
> > index 0000000000..7bd13ed61e
> > --- /dev/null
> > +++ b/lib/graph/graph_pcap.c
> > @@ -0,0 +1,217 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Marvell International Ltd.
> > + */
> > +
> > +#include <errno.h>
> > +#include <unistd.h>
> > +#include <stdlib.h>
> > +#include <pwd.h>
> 
> Sort in alphabetical order.
> 
> 
> > +
> > +#include <rte_mbuf.h>
> > +#include <rte_pcapng.h>
> > +
> > +#include "rte_graph_worker.h"
> > +
> > +#include "graph_pcap_private.h"
> > +
> > +#define GRAPH_PCAP_BUF_SZ      128
> > +#define GRAPH_PCAP_NUM_PACKETS 1024
> > +#define GRAPH_PCAP_PKT_POOL    "graph_pcap_pkt_pool"
> > +#define GRAPH_PCAP_FILE_NAME
> "dpdk_graph_pcap_capture_XXXXXX.pcapng"
> > +
> > +/* For multi-process, packets are captured in separate files. */
> > +static rte_pcapng_t *pcapng_fd; static bool pcap_enable; struct
> > +rte_mempool *pkt_mp;
> > +
> > +void
> > +graph_pcap_enable(bool val)
> > +{
> > +       pcap_enable = val;
> > +}
> > +
> > +int
> > +graph_pcap_is_enable(void)
> > +{
> > +       return pcap_enable;
> > +}
> > +
> > +void
> > +graph_pcap_exit(struct rte_graph *graph) {
> > +       if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > +               if (pkt_mp)
> > +                       rte_mempool_free(pkt_mp);
> > +
> > +       if (pcapng_fd) {
> > +               rte_pcapng_close(pcapng_fd);
> > +               pcapng_fd = NULL;
> > +       }
> > +
> > +       /* Disable pcap. */
> > +       graph->pcap_enable = 0;
> > +       graph_pcap_enable(0);
> > +}
> > +
> > +static int
> > +graph_pcap_default_path_get(char **dir_path) {
> > +       struct passwd *pwd;
> > +       char *home_dir;
> > +
> > +       /* First check for shell environment variable */
> > +       home_dir = getenv("HOME");
> > +       if (home_dir == NULL) {
> > +               graph_warn("Home env not preset.");
> > +               /* Fallback to password file entry */
> > +               pwd = getpwuid(getuid());
> > +               if (pwd == NULL)
> > +                       return -EINVAL;
> > +
> > +               home_dir = pwd->pw_dir;
> > +       }
> > +
> > +       /* Append default pcap file to directory */
> > +       if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME)
> == -1)
> > +               return -ENOMEM;
> > +
> > +       return 0;
> > +}
> > +
> > +int
> > +graph_pcap_file_open(const char *filename) {
> > +       int fd;
> > +       char file_name[RTE_GRAPH_PCAP_FILE_SZ];
> > +       char *pcap_dir;
> > +
> > +       if (pcapng_fd)
> > +               goto done;
> > +
> > +       if (!filename || filename[0] == '\0') {
> > +               if (graph_pcap_default_path_get(&pcap_dir) < 0)
> > +                       return -1;
> > +               snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
> > +               free(pcap_dir);
> > +       } else {
> > +               snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ,
> "%s_XXXXXX.pcapng",
> > +                        filename);
> > +       }
> > +
> > +       fd = mkstemps(file_name, strlen(".pcapng"));
> > +       if (fd < 0) {
> > +               graph_err("mkstemps() failure");
> > +               return -1;
> > +       }
> > +
> > +       graph_info("pcap filename: %s", file_name);
> > +
> > +       /* Open a capture file */
> > +       pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
> > +                                     NULL);
> > +       if (pcapng_fd == NULL) {
> > +               graph_err("Graph rte_pcapng_fdopen failed.");
> > +               close(fd);
> > +               return -1;
> > +       }
> > +
> > +done:
> > +       return 0;
> > +}
> > +
> > +int
> > +graph_pcap_mp_init(void)
> > +{
> > +       pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
> > +       if (pkt_mp)
> > +               goto done;
> > +
> > +       /* Make a pool for cloned packets */
> > +       pkt_mp =
> rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
> > +                       IOV_MAX + RTE_GRAPH_BURST_SIZE, 0, 0,
> > +                       rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
> > +                       SOCKET_ID_ANY, "ring_mp_mc");
> > +       if (pkt_mp == NULL) {
> > +               graph_err("Cannot create mempool for graph pcap capture.");
> > +               return -1;
> > +       }
> > +
> > +done:
> > +       return 0;
> > +}
> > +
> > +int
> > +graph_pcap_init(struct graph *graph)
> > +{
> > +       struct rte_graph *graph_data = graph->graph;
> > +
> > +       if (graph_pcap_file_open(graph->pcap_filename) < 0)
> > +               goto error;
> > +
> > +       if (graph_pcap_mp_init() < 0)
> > +               goto error;
> > +
> > +       /* User configured number of packets to capture. */
> > +       if (graph->num_pkt_to_capture)
> > +               graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
> > +       else
> > +               graph_data->nb_pkt_to_capture =
> > + GRAPH_PCAP_NUM_PACKETS;
> > +
> > +       /* All good. Now populate data for secondary process. */
> 
> No need new line.
> 
> > +
> > +       rte_strscpy(graph_data->pcap_filename, graph->pcap_filename,
> RTE_GRAPH_PCAP_FILE_SZ);
> > +       graph_data->pcap_enable = 1;
> > +
> > +       return 0;
> > +
> > +error:
> > +       graph_pcap_exit(graph_data);
> > +       graph_pcap_enable(0);
> > +       graph_err("Graph pcap initialization failed. Disabling pcap trace.");
> > +       return -1;
> > +}
> > +
> > +uint16_t
> > +graph_pcap_dispatch(struct rte_graph *graph,
> > +                             struct rte_node *node, void **objs,
> > +                             uint16_t nb_objs) {
> > +       struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
> > +       char buffer[GRAPH_PCAP_BUF_SZ];
> > +       uint64_t i, num_packets;
> > +       struct rte_mbuf *mbuf;
> > +       ssize_t len;
> > +
> > +       if (!nb_objs || (graph->nb_pkt_captured >= graph-
> >nb_pkt_to_capture))
> > +               goto done;
> > +
> > +       num_packets = graph->nb_pkt_to_capture - graph-
> >nb_pkt_captured;
> > +       /* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
> > +       if (num_packets > nb_objs)
> > +               num_packets = nb_objs;
> > +
> > +       snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name,
> > + node->name);
> > +
> > +       for (i = 0; i < num_packets; i++) {
> > +               struct rte_mbuf *mc;
> > +               mbuf = (struct rte_mbuf *)objs[i];
> > +
> > +               mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf-
> >pkt_len,
> > +                                    rte_get_tsc_cycles(), 0, buffer);
> > +               if (mc == NULL)
> > +                       break;
> > +
> > +               mbuf_clones[i] = mc;
> > +       }
> > +
> > +       /* write it to capture file */
> > +       len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
> > +       rte_pktmbuf_free_bulk(mbuf_clones, i);
> > +       if (len <= 0)
> > +               goto done;
> > +
> > +       graph->nb_pkt_captured += i;
> > +
> > +done:
> > +       return node->original_process(graph, node, objs, nb_objs); }
> > diff --git a/lib/graph/graph_pcap_private.h
> > b/lib/graph/graph_pcap_private.h new file mode 100644 index
> > 0000000000..198add67e2
> > --- /dev/null
> > +++ b/lib/graph/graph_pcap_private.h
> > @@ -0,0 +1,124 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2023 Marvell International Ltd.
> > + */
> > +
> > +#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
> > +#define _RTE_GRAPH_PCAP_PRIVATE_H_
> > +
> > +#include <stdint.h>
> > +#include <sys/types.h>
> > +
> > +#include "graph_private.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> 
> No need this for internal header files
> 
> 
> > +
> > +/**
> > + * @internal
> > + *
> > + * Pcap trace enable/disable function.
> > + *
> > + * The function is called to enable/disable graph pcap trace functionality.
> > + *
> > + * @param val
> > + *   Value to be set to enable/disable graph pcap trace.
> > + */
> > +void graph_pcap_enable(bool val);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Check graph pcap trace is enable/disable.
> > + *
> > + * The function is called to check if the graph pcap trace is
> enabled/disabled.
> > + *
> > + * @return
> > + *   - 1: Enable
> > + *   - 0: Disable
> > + */
> > +int graph_pcap_is_enable(void);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Initialise graph pcap trace functionality.
> > + *
> > + * The function invoked to allocate mempool.
> > + *
> > + * @return
> > + *   0 on success and -1 on failure.
> > + */
> > +int graph_pcap_mp_init(void);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Initialise graph pcap trace functionality.
> > + *
> > + * The function invoked to open pcap file.
> > + *
> > + * @param filename
> > + *   Pcap filename.
> > + *
> > + * @return
> > + *   0 on success and -1 on failure.
> > + */
> > +int graph_pcap_file_open(const char *filename);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Initialise graph pcap trace functionality.
> > + *
> > + * The function invoked when the graph pcap trace is enabled. This
> > +function
> > + * open's pcap file and allocates mempool. Information needed for
> > +secondary
> > + * process is populated.
> > + *
> > + * @param graph
> > + *   Pointer to graph structure.
> > + *
> > + * @return
> > + *   0 on success and -1 on failure.
> > + */
> > +int graph_pcap_init(struct graph *graph);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Exit graph pcap trace functionality.
> > + *
> > + * The function is called to exit graph pcap trace and close open
> > +fd's and
> > + * free up memory. Pcap trace is also disabled.
> > + *
> > + * @param graph
> > + *   Pointer to graph structure.
> > + */
> > +void graph_pcap_exit(struct rte_graph *graph);
> > +
> > +/**
> > + * @internal
> > + *
> > + * Capture mbuf metadata and node metadata to a pcap file.
> > + *
> > + * When graph pcap trace enabled, this function is invoked prior to
> > +each node
> > + * and mbuf, node metadata is parsed and captured in a pcap file.
> > + *
> > + * @param graph
> > + *   Pointer to the graph object.
> > + * @param node
> > + *   Pointer to the node object.
> > + * @param objs
> > + *   Pointer to an array of objects to be processed.
> > + * @param nb_objs
> > + *   Number of objects in the array.
> > + */
> > +uint16_t graph_pcap_dispatch(struct rte_graph *graph,
> > +                                  struct rte_node *node, void **objs,
> > +                                  uint16_t nb_objs);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
> > diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
> > index 102fd6c29b..2c0844ce92 100644
> > --- a/lib/graph/graph_populate.c
> > +++ b/lib/graph/graph_populate.c
> > @@ -9,6 +9,7 @@
> >  #include <rte_memzone.h>
> >
> >  #include "graph_private.h"
> > +#include "graph_pcap_private.h"
> >
> >  static size_t
> >  graph_fp_mem_calc_size(struct graph *graph) @@ -75,7 +76,11 @@
> > graph_nodes_populate(struct graph *_graph)
> >                 memset(node, 0, sizeof(*node));
> >                 node->fence = RTE_GRAPH_FENCE;
> >                 node->off = off;
> > -               node->process = graph_node->node->process;
> > +               if (graph_pcap_is_enable()) {
> > +                       node->process = graph_pcap_dispatch;
> > +                       node->original_process = graph_node->node->process;
> > +               } else
> > +                       node->process = graph_node->node->process;
> >                 memcpy(node->name, graph_node->node->name,
> RTE_GRAPH_NAMESIZE);
> >                 pid = graph_node->node->parent_id;
> >                 if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */ @@
> > -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
> >         int rc;
> >
> >         graph_header_popluate(graph);
> > +       if (graph_pcap_is_enable())
> > +               graph_pcap_init(graph);
> >         graph_nodes_populate(graph);
> >         rc = graph_node_nexts_populate(graph);
> >         rc |= graph_src_nodes_populate(graph); @@ -227,6 +234,9 @@
> > graph_nodes_mem_destroy(struct rte_graph *graph)  int
> > graph_fp_mem_destroy(struct graph *graph)  {
> > +       if (graph_pcap_is_enable())
> > +               graph_pcap_exit(graph->graph);
> > +
> >         graph_nodes_mem_destroy(graph->graph);
> >         return rte_memzone_free(graph->mz);  } diff --git
> > a/lib/graph/graph_private.h b/lib/graph/graph_private.h index
> > f9a85c8926..7d1b30b8ac 100644
> > --- a/lib/graph/graph_private.h
> > +++ b/lib/graph/graph_private.h
> > @@ -22,6 +22,7 @@ extern int rte_graph_logtype;
> >                         __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,
> > )))
> >
> >  #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
> > +#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
> >  #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)  #define
> > graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
> >
> > @@ -100,6 +101,10 @@ struct graph {
> >         /**< Memory size of the graph. */
> >         int socket;
> >         /**< Socket identifier where memory is allocated. */
> > +       uint64_t num_pkt_to_capture;
> > +       /**< Number of packets to be captured per core. */
> > +       char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
> > +       /**< pcap file name/path. */
> >         STAILQ_HEAD(gnode_list, graph_node) node_list;
> >         /**< Nodes in a graph. */
> >  };
> > diff --git a/lib/graph/meson.build b/lib/graph/meson.build index
> > c7327549e8..3526d1b5d4 100644
> > --- a/lib/graph/meson.build
> > +++ b/lib/graph/meson.build
> > @@ -14,7 +14,8 @@ sources = files(
> >          'graph_debug.c',
> >          'graph_stats.c',
> >          'graph_populate.c',
> > +        'graph_pcap.c',
> >  )
> >  headers = files('rte_graph.h', 'rte_graph_worker.h')
> >
> > -deps += ['eal']
> > +deps += ['eal', 'pcapng']
> > diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index
> > b32c4bc217..c9a77297fc 100644
> > --- a/lib/graph/rte_graph.h
> > +++ b/lib/graph/rte_graph.h
> > @@ -35,6 +35,7 @@ extern "C" {
> >
> >  #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
> > #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
> > +#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file
> name.
> > +*/
> >  #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph
> offset. */
> >  #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
> >  #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
> > @@ -164,6 +165,10 @@ struct rte_graph_param {
> >         uint16_t nb_node_patterns;  /**< Number of node patterns. */
> >         const char **node_patterns;
> >         /**< Array of node patterns based on shell pattern. */
> > +
> > +       bool pcap_enable; /**< Pcap enable. */
> > +       uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
> > +       char *pcap_filename; /**< Filename in which packets to be
> > + captured.*/
> >  };
> >
> >  /**
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index fc6fee48c8..438595b15c 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -44,6 +44,12 @@ struct rte_graph {
> >         rte_graph_t id; /**< Graph identifier. */
> >         int socket;     /**< Socket ID where memory is allocated. */
> >         char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */
> > +       bool pcap_enable;               /**< Pcap trace enabled. */
> > +       /** Number of packets captured per core. */
> > +       uint64_t nb_pkt_captured;
> > +       /** Number of packets to capture per core. */
> > +       uint64_t nb_pkt_to_capture;
> > +       char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap
> > + filename. */
> >         uint64_t fence;                 /**< Fence. */
> >  } __rte_cache_aligned;
> >
> > @@ -64,6 +70,9 @@ struct rte_node {
> >         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
> >         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
> >
> > +       /** Original process function when pcap is enabled. */
> > +       rte_node_process_t original_process;
> > +
> >         /* Fast path area  */
> >  #define RTE_NODE_CTX_SZ 16
> >         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node
> > Context. */
> > --
> > 2.25.1
> >

^ permalink raw reply	[relevance 0%]

* [PATCH] doc: update NFP documentation with Corigine information
@ 2023-02-03  8:08  6% Chaoyong He
  2023-02-15 13:37  0% ` Ferruh Yigit
    0 siblings, 2 replies; 200+ results
From: Chaoyong He @ 2023-02-03  8:08 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Walter Heymans, Chaoyong He

From: Walter Heymans <walter.heymans@corigine.com>

The NFP PMD documentation is updated to include information about
Corigine and their new vendor device ID.

Outdated information regarding the use of the PMD is also updated.

While making major changes to the document, the maximum number of
characters per line is updated to 80 characters to improve the
readability in raw format.

Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 doc/guides/nics/nfp.rst | 168 +++++++++++++++++++++-------------------
 1 file changed, 90 insertions(+), 78 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index a085d7d9ae..6fea280411 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -1,35 +1,34 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
-    All rights reserved.
+    Copyright(c) 2021 Corigine, Inc. All rights reserved.
 
 NFP poll mode driver library
 ============================
 
-Netronome's sixth generation of flow processors pack 216 programmable
-cores and over 100 hardware accelerators that uniquely combine packet,
-flow, security and content processing in a single device that scales
+Netronome and Corigine's sixth generation of flow processors pack 216
+programmable cores and over 100 hardware accelerators that uniquely combine
+packet, flow, security and content processing in a single device that scales
 up to 400-Gb/s.
 
-This document explains how to use DPDK with the Netronome Poll Mode
-Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
-(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
-Netronome's Network Flow Processor 38xx (NFP-38xx).
+This document explains how to use DPDK with the Network Flow Processor (NFP)
+Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
+and NFP-38xx product lines.
 
-NFP is a SRIOV capable device and the PMD supports the physical
-function (PF) and the virtual functions (VFs).
+NFP is a SR-IOV capable device and the PMD supports the physical function (PF)
+and the virtual functions (VFs).
 
 Dependencies
 ------------
 
-Before using the Netronome's DPDK PMD some NFP configuration,
-which is not related to DPDK, is required. The system requires
-installation of **Netronome's BSP (Board Support Package)** along
-with a specific NFP firmware application. Netronome's NSP ABI
-version should be 0.20 or higher.
+Before using the NFP DPDK PMD some NFP configuration, which is not related to
+DPDK, is required. The system requires installation of
+**NFP-BSP (Board Support Package)** along with a specific NFP firmware
+application. The NSP ABI version should be 0.20 or higher.
 
-If you have a NFP device you should already have the code and
-documentation for this configuration. Contact
-**support@netronome.com** to obtain the latest available firmware.
+If you have a NFP device you should already have the documentation to perform
+this configuration. Contact **support@netronome.com** (for Netronome products)
+or **smartnic-support@corigine.com** (for Corigine products) to obtain the
+latest available firmware.
 
 The NFP Linux netdev kernel driver for VFs has been a part of the
 vanilla kernel since kernel version 4.5, and support for the PF
@@ -44,11 +43,11 @@ Linux kernel driver.
 Building the software
 ---------------------
 
-Netronome's PMD code is provided in the **drivers/net/nfp** directory.
-Although NFP PMD has Netronome´s BSP dependencies, it is possible to
-compile it along with other DPDK PMDs even if no BSP was installed previously.
-Of course, a DPDK app will require such a BSP installed for using the
-NFP PMD, along with a specific NFP firmware application.
+The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
+NFP PMD has BSP dependencies, it is possible to compile it along with other
+DPDK PMDs even if no BSP was installed previously. Of course, a DPDK app will
+require such a BSP installed for using the NFP PMD, along with a specific NFP
+firmware application.
 
 Once the DPDK is built all the DPDK apps and examples include support for
 the NFP PMD.
@@ -57,27 +56,20 @@ the NFP PMD.
 Driver compilation and testing
 ------------------------------
 
-Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
-for details.
+Refer to the document
+:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` for details.
 
 Using the PF
 ------------
 
-NFP PMD supports using the NFP PF as another DPDK port, but it does not
-have any functionality for controlling VFs. In fact, it is not possible to use
-the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
-bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
-have a PMD able to work with the PF and VFs at the same time and with the PF
-implementing VF management along with other PF-only functionalities/offloads.
-
 The PMD PF has extra work to do which will delay the DPDK app initialization
-like uploading the firmware and configure the Link state properly when starting or
-stopping a PF port. Since DPDK 18.05 the firmware upload happens when
+like uploading the firmware and configure the Link state properly when starting
+or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
 a PF is initialized, which was not always true with older DPDK versions.
 
-Depending on the Netronome product installed in the system, firmware files
-should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
-PF looks for a firmware file in this order:
+Depending on the product installed in the system, firmware files should be
+available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
+for a firmware file in this order:
 
 	1) First try to find a firmware image specific for this device using the
 	   NFP serial number:
@@ -92,18 +84,21 @@ PF looks for a firmware file in this order:
 
 		nic_AMDA0099-0001_2x25.nffw
 
-Netronome's software packages install firmware files under ``/lib/firmware/netronome``
-to support all the Netronome's SmartNICs and different firmware applications.
-This is usually done using file names based on SmartNIC type and media and with a
-directory per firmware application. Options 1 and 2 for firmware filenames allow
-more than one SmartNIC, same type of SmartNIC or different ones, and to upload a
-different firmware to each SmartNIC.
+Netronome and Corigine's software packages install firmware files under
+``/lib/firmware/netronome`` to support all the SmartNICs and different firmware
+applications. This is usually done using file names based on SmartNIC type and
+media and with a directory per firmware application. Options 1 and 2 for
+firmware filenames allow more than one SmartNIC, same type of SmartNIC or
+different ones, and to upload a different firmware to each SmartNIC.
 
    .. Note::
-      Currently the NFP PMD supports using the PF with Agilio Firmware with NFD3
-      and Agilio Firmware with NFDk. See https://help.netronome.com/support/solutions
+      Currently the NFP PMD supports using the PF with Agilio Firmware with
+      NFD3 and Agilio Firmware with NFDk. See
+      `Netronome Support <https://help.netronome.com/support/solutions>`_.
       for more information on the various firmwares supported by the Netronome
-      Agilio CX smartNIC.
+      Agilio SmartNICs range, or
+      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
+      for more information about Corigine's range.
 
 PF multiport support
 --------------------
@@ -118,7 +113,7 @@ this particular configuration requires the PMD to create ports in a special way,
 although once they are created, DPDK apps should be able to use them as normal
 PCI ports.
 
-NFP ports belonging to same PF can be seen inside PMD initialization with a
+NFP ports belonging to the same PF can be seen inside PMD initialization with a
 suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
 0000:03:00.0 and four ports is seen by the PMD code as:
 
@@ -137,50 +132,67 @@ suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
 PF multiprocess support
 -----------------------
 
-Due to how the driver needs to access the NFP through a CPP interface, which implies
-to use specific registers inside the chip, the number of secondary processes with PF
-ports is limited to only one.
+Due to how the driver needs to access the NFP through a CPP interface, which
+implies to use specific registers inside the chip, the number of secondary
+processes with PF ports is limited to only one.
 
-This limitation will be solved in future versions but having basic multiprocess support
-is important for allowing development and debugging through the PF using a secondary
-process which will create a CPP bridge for user space tools accessing the NFP.
+This limitation will be solved in future versions, but having basic
+multiprocess support is important for allowing development and debugging
+through the PF using a secondary process, which will create a CPP bridge
+for user space tools accessing the NFP.
 
 
 System configuration
 --------------------
 
 #. **Enable SR-IOV on the NFP device:** The current NFP PMD supports the PF and
-   the VFs on a NFP device. However, it is not possible to work with both at the
-   same time because the VFs require the PF being bound to the NFP PF Linux
-   netdev driver.  Make sure you are working with a kernel with NFP PF support or
-   get the drivers from the above Github repository and follow the instructions
-   for building and installing it.
+   the VFs on a NFP device. However, it is not possible to work with both at
+   the same time when using the netdev NFP Linux netdev driver. It is possible
+   to bind the PF to the ``vfio-pci`` kernel module, and create VFs afterwards.
+   This requires loading the ``vfio-pci`` module with the following parameters:
+
+   .. code-block:: console
+
+      modprobe vfio-pci enable_sriov=1 disable_idle_d3=1
+
+   VFs need to be enabled before they can be used with the PMD. Before enabling
+   the VFs it is useful to obtain information about the current NFP PCI device
+   detected by the system. This can be done on Netronome SmartNICs using:
+
+   .. code-block:: console
+
+      lspci -d 19ee:
 
-   VFs need to be enabled before they can be used with the PMD.
-   Before enabling the VFs it is useful to obtain information about the
-   current NFP PCI device detected by the system:
+   and on Corigine SmartNICs using:
 
    .. code-block:: console
 
-      lspci -d19ee:
+      lspci -d 1da8:
 
-   Now, for example, configure two virtual functions on a NFP-6xxx device
+   Now, for example, to configure two virtual functions on a NFP device
    whose PCI system identity is "0000:03:00.0":
 
    .. code-block:: console
 
       echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
 
-   The result of this command may be shown using lspci again:
+   The result of this command may be shown using lspci again on Netronome
+   SmartNICs:
+
+   .. code-block:: console
+
+      lspci -d 19ee: -k
+
+   and on Corigine SmartNICs:
 
    .. code-block:: console
 
-      lspci -d19ee: -k
+      lspci -d 1da8: -k
 
    Two new PCI devices should appear in the output of the above command. The
-   -k option shows the device driver, if any, that devices are bound to.
-   Depending on the modules loaded at this point the new PCI devices may be
-   bound to nfp_netvf driver.
+   -k option shows the device driver, if any, that the devices are bound to.
+   Depending on the modules loaded, at this point the new PCI devices may be
+   bound to the ``nfp`` kernel driver or ``vfio-pci``.
 
 
 Flow offload
@@ -193,13 +205,13 @@ The flower firmware application requires the PMD running two services:
 
 	* PF vNIC service: handling the feedback traffic.
 	* ctrl vNIC service: communicate between PMD and firmware through
-	  control message.
+	  control messages.
 
 To achieve the offload of flow, the representor ports are exposed to OVS.
-The flower firmware application support representor port for VF and physical
-port. There will always exist a representor port for each physical port,
-and the number of the representor port for VF is specified by the user through
-parameter.
+The flower firmware application supports VF, PF, and physical port representor
+ports. There will always exist a representor port for a PF and each physical
+port. The number of the representor ports for VFs are specified by the user
+through a parameter.
 
 In the Rx direction, the flower firmware application will prepend the input
 port information into metadata for each packet which can't offloaded. The PF
@@ -207,12 +219,12 @@ vNIC service will keep polling packets from the firmware, and multiplex them
 to the corresponding representor port.
 
 In the Tx direction, the representor port will prepend the output port
-information into metadata for each packet, and then send it to firmware through
-PF vNIC.
+information into metadata for each packet, and then send it to the firmware
+through the PF vNIC.
 
-The ctrl vNIC service handling various control message, like the creation and
-configuration of representor port, the pattern and action of flow rules, the
-statistics of flow rules, and so on.
+The ctrl vNIC service handles various control messages, for example, the
+creation and configuration of a representor port, the pattern and action of
+flow rules, the statistics of flow rules, etc.
 
 Metadata Format
 ---------------
-- 
2.29.3


^ permalink raw reply	[relevance 6%]

* [PATCH v3 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  @ 2023-02-03  5:07  6%   ` Jiawei Wang
    1 sibling, 0 replies; 200+ results
From: Jiawei Wang @ 2023-02-03  5:07 UTC (permalink / raw)
  To: viacheslavo, orika, thomas, Aman Singh, Yuying Zhang,
	Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

For the multiple hardware ports connect to a single DPDK port (mhpsdp),
the previous patch introduces the new rte flow item to match the
phy affinity of the received packets.

Add the tx_phy_affinity setting in Tx queue API, the affinity
value reflects packets be sent to which hardware port.
Value 0 is no affinity and traffic will be routed between different
physical ports.

Add the nb_phy_ports into device info and value greater than 0 mean
that the number of physical ports connect to the DPDK port.

Add the new tx_phy_affinity field into the padding hole of rte_eth_txconf
structure, the size of rte_eth_txconf keeps the same. Adds a suppress
type for structure change in the ABI check file.

Add the testpmd command line:
testpmd> port config (port_id) txq (queue_id) phy_affinity (value)

For example, there're two hardware ports 0 and 1 connected to
a single DPDK port (port id 0), and phy_affinity 1 stood for
hardware port 0 and phy_affinity 2 stood for hardware port 1,
used the below command to config tx phy affinity for per Tx Queue:
        port config 0 txq 0 phy_affinity 1
        port config 0 txq 1 phy_affinity 1
        port config 0 txq 2 phy_affinity 2
        port config 0 txq 3 phy_affinity 2

These commands config the TxQ index 0 and TxQ index 1 with phy affinity 1,
uses TxQ 0 or TxQ 1 send packets, these packets will be sent from the
hardware port 0, and similar with hardware port 1 if sending packets
with TxQ 2 or TxQ 3.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
 app/test-pmd/config.c                       |   1 +
 devtools/libabigail.abignore                |   5 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
 lib/ethdev/rte_ethdev.h                     |  13 ++-
 5 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b32dc8bfd4..3450b1be36 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -764,6 +764,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 
 			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
 			"    Cleanup txq mbufs for a specific Tx queue\n\n"
+
+			"port config (port_id) txq (queue_id) phy_affinity (value)\n"
+			"    Set the physical affinity value "
+			"on a specific Tx queue\n\n"
 		);
 	}
 
@@ -12621,6 +12625,101 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
 	}
 };
 
+/* *** configure port txq phy_affinity value *** */
+struct cmd_config_tx_phy_affinity {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	portid_t portid;
+	cmdline_fixed_string_t txq;
+	uint16_t qid;
+	cmdline_fixed_string_t phy_affinity;
+	uint8_t value;
+};
+
+static void
+cmd_config_tx_phy_affinity_parsed(void *parsed_result,
+				  __rte_unused struct cmdline *cl,
+				  __rte_unused void *data)
+{
+	struct cmd_config_tx_phy_affinity *res = parsed_result;
+	struct rte_eth_dev_info dev_info;
+	struct rte_port *port;
+	int ret;
+
+	if (port_id_is_invalid(res->portid, ENABLED_WARN))
+		return;
+
+	if (res->portid == (portid_t)RTE_PORT_ALL) {
+		printf("Invalid port id\n");
+		return;
+	}
+
+	port = &ports[res->portid];
+
+	if (strcmp(res->txq, "txq")) {
+		printf("Unknown parameter\n");
+		return;
+	}
+	if (tx_queue_id_is_invalid(res->qid))
+		return;
+
+	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
+	if (ret != 0)
+		return;
+
+	if (dev_info.nb_phy_ports == 0) {
+		printf("Number of physical ports is 0 which is invalid for PHY Affinity\n");
+		return;
+	}
+	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
+	if (dev_info.nb_phy_ports < res->value) {
+		printf("The PHY affinity value %u is Invalid, exceeds the "
+		       "number of physical ports\n", res->value);
+		return;
+	}
+	port->txq[res->qid].conf.tx_phy_affinity = res->value;
+
+	cmd_reconfig_device_queue(res->portid, 0, 1);
+}
+
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 port, "port");
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 config, "config");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 portid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 txq, "txq");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      qid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 phy_affinity, "phy_affinity");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      value, RTE_UINT8);
+
+static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
+	.f = cmd_config_tx_phy_affinity_parsed,
+	.data = (void *)0,
+	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
+	.tokens = {
+		(void *)&cmd_config_tx_phy_affinity_port,
+		(void *)&cmd_config_tx_phy_affinity_config,
+		(void *)&cmd_config_tx_phy_affinity_portid,
+		(void *)&cmd_config_tx_phy_affinity_txq,
+		(void *)&cmd_config_tx_phy_affinity_qid,
+		(void *)&cmd_config_tx_phy_affinity_hwport,
+		(void *)&cmd_config_tx_phy_affinity_value,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -12851,6 +12950,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_show_capability,
 	(cmdline_parse_inst_t *)&cmd_set_flex_is_pattern,
 	(cmdline_parse_inst_t *)&cmd_set_flex_spec_pattern,
+	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
 	NULL,
 };
 
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index acccb6b035..b83fb17cfa 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
 		printf("unknown\n");
 		break;
 	}
+	printf("Current number of physical ports: %u\n", dev_info.nb_phy_ports);
 }
 
 void
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..0f4b5ec74b 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -34,3 +34,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; Ignore fields inserted in middle padding of rte_eth_txconf
+[suppress_type]
+        name = rte_eth_txconf
+        has_data_member_inserted_between = {offset_of(tx_deferred_start), offset_of(offloads)}
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 0037506a79..856fb55005 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
 
 This command should be run when the port is stopped, or else it will fail.
 
+config per queue Tx physical affinity
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configure a per queue physical affinity value only on a specific Tx queue::
+
+   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
+
+* ``phy_affinity``: reflects packet can be sent to which hardware port.
+                    uses it on multiple hardware ports connect to
+                    a single DPDK port (mhpsdp).
+
+This command should be run when the port is stopped, or else it will fail.
+
 Config VXLAN Encap outer layers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..ecfa2c6781 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1138,6 +1138,16 @@ struct rte_eth_txconf {
 				      less free descriptors than this value. */
 
 	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	/**
+	 * Physical affinity to be set.
+	 * Value 0 is no affinity and traffic could be routed between different
+	 * physical ports, if 0 is disabled then try to match on phy_affinity 0 will
+	 * result in an error.
+	 *
+	 * Value starts from 1 means for specific phy affinity and uses 1 for
+	 * the first physical port.
+	 */
+	uint8_t tx_phy_affinity;
 	/**
 	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
 	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
@@ -1777,7 +1787,8 @@ struct rte_eth_dev_info {
 	struct rte_eth_switch_info switch_info;
 	/** Supported error handling mode. */
 	enum rte_eth_err_handle_mode err_handle_mode;
-
+	uint8_t nb_phy_ports;
+	/** Number of physical ports to connect with single DPDK port. */
 	uint64_t reserved_64s[2]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
-- 
2.18.1


^ permalink raw reply	[relevance 6%]

* RE: Sign changes through function signatures
  2023-02-02 20:45  3%   ` Thomas Monjalon
@ 2023-02-02 21:26  3%     ` Morten Brørup
  2023-02-03 12:05  4%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-02 21:26 UTC (permalink / raw)
  To: Thomas Monjalon, Ben Magistro, Tyler Retzlaff, bruce.richardson
  Cc: Olivier Matz, ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 2 February 2023 21.45
> 
> 02/02/2023 21:26, Tyler Retzlaff:
> > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > Hello,
> > >
> > > While making some updates to our code base for 22.11.1 that were
> missed in
> > > our first pass through, we hit the numa node change[1].  In the
> process of
> > > updating our code, we noticed that a couple functions
> (rx/tx_queue_setup,
> > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> but the
> > > function signature then asks for an unsigned integer while
> `SOCKET_ID_ANY`
> > > is `-1`.  Following it through the redirect to the "real" function
> it also
> > > asks for an unsigned integer which is then passed on to one or more
> > > functions asking for an integer.  As an example using the the i40e
> driver
> > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> calls
> > > `i40e_dev_tx_queue_setup`[3] which finally calls
> `rte_zmalloc_socket`[4]
> > > and `rte_eth_dma_zone_reserve`[5].
> > >
> > > I guess what I am looking for is clarification on if this is
> intentional or
> > > if this is additional cleanup that may need to be completed/be
> desirable so
> > > that signs are maintained through the call paths and avoid
> potentially
> > > producing sign-conversion warnings.  From the very quick glance I
> took at
> > > the i40e driver, it seems these are just passed through to other
> functions
> > > and no direct use/manipulation occurs (at least in the mentioned
> functions).
> >
> > i believe this is just sloppyness with sign in our api surface. i too
> > find it frustrating that use of these api force either explicit
> > casts or suffer having to suppress warnings.
> >
> > in the past examples of this have been cleaned up without full
> deprecation
> > notices but there are a lot of instances. i also feel (unpopular
> opinion)
> > that for some integer types like this that have constrained range /
> number
> > spaces it would be of value to introduce a typedef that can be used
> > consistently.
> >
> > for now you'll just have to add the casts and hopefully in the future
> we
> > will fix the api making them unnecessary. of course feel free to
> submit
> > patches too, it would be great to have these cleaned up.
> 
> I agree it should be cleaned up.
> Those IDs should accept negative values.
> Not sure which type we should choose (int, int32_t, or a typedef).

Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)

Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.

> 
> Another thing to check is the name of the variable.
> It should be a socket ID when talking about CPU,
> and a NUMA node ID when talking about memory.
> 
> And last but not the least,
> how can we keep ABI compatibility?
> I hope we can use function versioning to avoid deprecation and
> breaking.
> 
> Trials and suggestions are welcome.

Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.

And functions should always return -1, never SOCKET_ID_ANY, to indicate error.

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/

I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(


^ permalink raw reply	[relevance 3%]

* Re: Sign changes through function signatures
  @ 2023-02-02 20:45  3%   ` Thomas Monjalon
  2023-02-02 21:26  3%     ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-02-02 20:45 UTC (permalink / raw)
  To: Ben Magistro, Tyler Retzlaff
  Cc: Olivier Matz, ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, bruce.richardson,
	anatoly.burakov

02/02/2023 21:26, Tyler Retzlaff:
> On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > Hello,
> > 
> > While making some updates to our code base for 22.11.1 that were missed in
> > our first pass through, we hit the numa node change[1].  In the process of
> > updating our code, we noticed that a couple functions (rx/tx_queue_setup,
> > maybe more that we aren't using) state they accept `SOCKET_ID_ANY` but the
> > function signature then asks for an unsigned integer while `SOCKET_ID_ANY`
> > is `-1`.  Following it through the redirect to the "real" function it also
> > asks for an unsigned integer which is then passed on to one or more
> > functions asking for an integer.  As an example using the the i40e driver
> > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately calls
> > `i40e_dev_tx_queue_setup`[3] which finally calls `rte_zmalloc_socket`[4]
> > and `rte_eth_dma_zone_reserve`[5].
> > 
> > I guess what I am looking for is clarification on if this is intentional or
> > if this is additional cleanup that may need to be completed/be desirable so
> > that signs are maintained through the call paths and avoid potentially
> > producing sign-conversion warnings.  From the very quick glance I took at
> > the i40e driver, it seems these are just passed through to other functions
> > and no direct use/manipulation occurs (at least in the mentioned functions).
> 
> i believe this is just sloppyness with sign in our api surface. i too
> find it frustrating that use of these api force either explicit
> casts or suffer having to suppress warnings.
> 
> in the past examples of this have been cleaned up without full deprecation
> notices but there are a lot of instances. i also feel (unpopular opinion)
> that for some integer types like this that have constrained range / number
> spaces it would be of value to introduce a typedef that can be used
> consistently.
> 
> for now you'll just have to add the casts and hopefully in the future we
> will fix the api making them unnecessary. of course feel free to submit
> patches too, it would be great to have these cleaned up.

I agree it should be cleaned up.
Those IDs should accept negative values.
Not sure which type we should choose (int, int32_t, or a typedef).

Another thing to check is the name of the variable.
It should be a socket ID when talking about CPU,
and a NUMA node ID when talking about memory.

And last but not the least,
how can we keep ABI compatibility?
I hope we can use function versioning to avoid deprecation and breaking.

Trials and suggestions are welcome.



^ permalink raw reply	[relevance 3%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-02 19:00  4%           ` Tyler Retzlaff
@ 2023-02-02 20:44  0%             ` Morten Brørup
  2023-02-03 13:56  0%               ` Bruce Richardson
  2023-02-03 12:19  0%             ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-02 20:44 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Honnappa Nagarahalli, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Thursday, 2 February 2023 20.00
> 
> On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Wednesday, 1 February 2023 22.41
> > >
> > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> wrote:
> > > >
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > >
> > > > > Honnappa, please could you give your view on the future of
> atomics
> > > in DPDK?
> > > > Thanks Thomas, apologies it has taken me a while to get to this
> > > discussion.
> > > >
> > > > IMO, we do not need DPDK's own abstractions. APIs from
> stdatomic.h
> > > (stdatomics as is called here) already serve the purpose. These
> APIs
> > > are well understood and documented.
> > >
> > > i agree that whatever atomics APIs we advocate for should align
> with
> > > the
> > > standard C atomics for the reasons you state including implied
> > > semantics.
> > >
> > > >
> > > > For environments where stdatomics are not supported, we could
> have a
> > > stdatomic.h in DPDK implementing the same APIs (we have to support
> only
> > > _explicit APIs). This allows the code to use stdatomics APIs and
> when
> > > we move to minimum supported standard C11, we just need to get rid
> of
> > > the file in DPDK repo.
> >
> > Perhaps we can use something already existing, such as this:
> > https://android.googlesource.com/platform/bionic/+/lollipop-
> release/libc/include/stdatomic.h
> >
> > >
> > > my concern with this is that if we provide a stdatomic.h or
> introduce
> > > names
> > > from stdatomic.h it's a violation of the C standard.
> > >
> > > references:
> > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > >  * GNU libc manual
> > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > Names.html
> > >
> > > in effect the header, the names and in some instances namespaces
> > > introduced
> > > are reserved by the implementation. there are several reasons in
> the
> > > GNU libc
> > > manual that explain the justification for these reservations and if
> > > if we think about ODR and ABI compatibility we can conceive of
> others.
> >
> > I we are going to move to C11 soon, I consider the shim interim, and
> am inclined to ignore these warning factors.
> >
> > If we are not moving to C11 soon, I would consider these
> disadvantages more seriously.
> 
> I think it's reasonable to assume that we are talking years here.
> 
> We've had a few discussions about minimum C standard. I think my first
> mailing list exchanges about C99 was almost 2 years ago. Given that we
> still aren't on C99 now (though i know Bruce has a series up) indicates
> that progression to C11 isn't going to happen any time soon and even if
> it was the baseline we still can't just use it (reasons described
> later).
> 
> Also, i'll point out that we seem to have accepted moving to C99 with
> one of the holdback compilers technically being non-conformant but it
> isn't blocking us because it provides the subset of C99 features
> without
> being conforming that we happen to be using.
> 
> >
> > >
> > > i'll also remark that the inter-mingling of names from the POSIX
> > > standard implicitly exposed as a part of the EAL public API has
> been
> > > problematic for portability.
> >
> > This is a very important remark, which should be considered
> carefully! Tyler has firsthand experience with DPDK portability. If he
> thinks porting to Windows is going to be a headache if we expose the
> stdatomic.h API, we must listen! So, what is your gut feeling here,
> Tyler?
> 
> I think this is even more of a concern with language standard than it
> is
> with a platform standard. Because the language standard is used across
> platforms.
> 
> On the surface it looks appealing to just go through all the dpdk code
> one last time and #include <stdatomic.h> and directly depend on names
> that "look" standard. In practice though we aren't depending on the
> toolchain / libc surface we are binding ourselves to the shim and the
> implementation it provides.
> 
> This is aside from the mechanics of making it work in the different
> contexts we now have to care about. Here is a story of how things
> become tricky.
> 
> When i #include <stdatomic.h> which one gets used if the implementation
> provides one? Let's force our stdatomic.h
> 
> Now i need to force the build system to prefer my shim header? Keeping
> in mind that the presence of a libc stdatomic.h does not mean that the
> toolchain in fact supports standard atomics. Okay, that's under our
> control by editing some meson.build files maybe it isn't so bad but...
> 
> It seems my application also has to do the same in their build system
> now because...
> 
> The type definitions (size, alignment) and code generated from the
> body of inline functions as seen by the application built translation
> units may differ from those in the dpdk translation units if they don't
> use our header. The potential for ABI compat problems is increasing but
> maybe it is managable? it can be worse...
> 
> We can't limit our scope to thinking that there is just an
> application (a single binary) and dpdk. Complex applications will
> invariably depend on other libraries and if the application needs to
> interface with those compatibily at the ABI level using standard
> atomics
> then we've made it very difficult since the application has to choose
> to
> use our conflicting named atomic types which may not be compatible or
> the real standard atomics.  They can of course produce horrible shims
> of their own to interoperate.
> 
> We need consistency across the entire binary at runtime and i don't
> think it's practical to say that anyone who uses dpdk has to compile
> their whole world with our shim. So dealing with all this complexity
> for the sake of asthetics "looking" like the standard api seems kind
> of not worth it. Sure it saves having to deprecate later and one last
> session of shotgun surgery but that's kind of all we get.
> 
> Don't think i'm being biased in favor of windows/msvc here. From the
> perspective of the windows/msvc combination i intend to use only the
> standard C ABI provided by the implementation. I have no intention of
> trying to introduce support for the current ABI that doesn't use the
> standard atomic types. my discouraging of this approach is about
> avoiding
> subtle to detect but very painful problems on
> {linux,unix}/compiler<version>
> combinations that already have a shipped/stable ABI.
> 
> > >
> > > let's discuss this from here. if there's still overwhelming desire
> to
> > > go
> > > this route then we'll just do our best.
> > >
> > > ty
> >
> > I have a preference for exposing the stdatomic.h API. Tyler listed
> the disadvantages above. (I also have a preference for moving to C11
> soon.)
> 
> I am eager to see this happen, but as explained in my original proposal
> it doesn't eliminate the need for an abstraction. Unless we are willing
> to break our compatibility promises and potentially take a performance
> hit on some platform/compiler combinations which as i understand is not
> acceptable.
> 
> >
> > Exposing a 1:1 similar API with RTE prefixes would also be acceptable
> for me. The disadvantage is that the names are different than the C11
> names, which might lead to some confusion. And from an ABI stability
> perspective, such an important API should not be marked experimental.
> This means that years will pass before we can get rid of it again, due
> to ABI stability policies.
> 
> I think the key to success with rte_ prefixed names is making
> absolutely
> sure we mirror the semantics and types in the standard.
> 
> I will point out one bit of fine print here is that we will not support
> atomic operations on struct/union types (something the standard
> supports).
> With the rte_ namespace i think this becomes less ambiguous, if we
> present
> standard C names though what's to avoid the confusion? Aside from it
> fails
> to compile with one compiler vs another.
> 
> I agree that this may be around for years. But how many years depends a
> lot on how long we have to maintain compatibility for the existing
> platform/compiler combinations that can't (and aren't enabled) to use
> the standard.
> 
> Even if we introduced standard names we still have to undergo some kind
> of mutant deprecation process to get the world to recompile everything
> against the actual standard, so it doesn't give us forward
> compatibility.
> 
> Let me know what folks would like to do, i guess i'm firmly leaned
> toward no-shim and just rte_ explicit. But as a community i'll pursue
> whatever you decide.
> 
> Thanks!

Tyler is making a very strong case here.

I have changed my mind, and now support Tyler's approach.

-Morten


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-02  8:43  4%         ` Morten Brørup
@ 2023-02-02 19:00  4%           ` Tyler Retzlaff
  2023-02-02 20:44  0%             ` Morten Brørup
  2023-02-03 12:19  0%             ` Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-02 19:00 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Honnappa Nagarahalli, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 1 February 2023 22.41
> > 
> > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> > >
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > >
> > > > Honnappa, please could you give your view on the future of atomics
> > in DPDK?
> > > Thanks Thomas, apologies it has taken me a while to get to this
> > discussion.
> > >
> > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > (stdatomics as is called here) already serve the purpose. These APIs
> > are well understood and documented.
> > 
> > i agree that whatever atomics APIs we advocate for should align with
> > the
> > standard C atomics for the reasons you state including implied
> > semantics.
> > 
> > >
> > > For environments where stdatomics are not supported, we could have a
> > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > _explicit APIs). This allows the code to use stdatomics APIs and when
> > we move to minimum supported standard C11, we just need to get rid of
> > the file in DPDK repo.
> 
> Perhaps we can use something already existing, such as this:
> https://android.googlesource.com/platform/bionic/+/lollipop-release/libc/include/stdatomic.h
> 
> > 
> > my concern with this is that if we provide a stdatomic.h or introduce
> > names
> > from stdatomic.h it's a violation of the C standard.
> > 
> > references:
> >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> >  * GNU libc manual
> >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > Names.html
> > 
> > in effect the header, the names and in some instances namespaces
> > introduced
> > are reserved by the implementation. there are several reasons in the
> > GNU libc
> > manual that explain the justification for these reservations and if
> > if we think about ODR and ABI compatibility we can conceive of others.
> 
> I we are going to move to C11 soon, I consider the shim interim, and am inclined to ignore these warning factors.
> 
> If we are not moving to C11 soon, I would consider these disadvantages more seriously.

I think it's reasonable to assume that we are talking years here.

We've had a few discussions about minimum C standard. I think my first
mailing list exchanges about C99 was almost 2 years ago. Given that we
still aren't on C99 now (though i know Bruce has a series up) indicates
that progression to C11 isn't going to happen any time soon and even if
it was the baseline we still can't just use it (reasons described
later).

Also, i'll point out that we seem to have accepted moving to C99 with
one of the holdback compilers technically being non-conformant but it
isn't blocking us because it provides the subset of C99 features without
being conforming that we happen to be using.

> 
> > 
> > i'll also remark that the inter-mingling of names from the POSIX
> > standard implicitly exposed as a part of the EAL public API has been
> > problematic for portability.
> 
> This is a very important remark, which should be considered carefully! Tyler has firsthand experience with DPDK portability. If he thinks porting to Windows is going to be a headache if we expose the stdatomic.h API, we must listen! So, what is your gut feeling here, Tyler?

I think this is even more of a concern with language standard than it is
with a platform standard. Because the language standard is used across
platforms.

On the surface it looks appealing to just go through all the dpdk code
one last time and #include <stdatomic.h> and directly depend on names
that "look" standard. In practice though we aren't depending on the
toolchain / libc surface we are binding ourselves to the shim and the
implementation it provides.

This is aside from the mechanics of making it work in the different
contexts we now have to care about. Here is a story of how things
become tricky.

When i #include <stdatomic.h> which one gets used if the implementation
provides one? Let's force our stdatomic.h

Now i need to force the build system to prefer my shim header? Keeping
in mind that the presence of a libc stdatomic.h does not mean that the
toolchain in fact supports standard atomics. Okay, that's under our
control by editing some meson.build files maybe it isn't so bad but...

It seems my application also has to do the same in their build system
now because...

The type definitions (size, alignment) and code generated from the
body of inline functions as seen by the application built translation
units may differ from those in the dpdk translation units if they don't
use our header. The potential for ABI compat problems is increasing but
maybe it is managable? it can be worse...

We can't limit our scope to thinking that there is just an
application (a single binary) and dpdk. Complex applications will
invariably depend on other libraries and if the application needs to
interface with those compatibily at the ABI level using standard atomics
then we've made it very difficult since the application has to choose to
use our conflicting named atomic types which may not be compatible or
the real standard atomics.  They can of course produce horrible shims
of their own to interoperate.

We need consistency across the entire binary at runtime and i don't
think it's practical to say that anyone who uses dpdk has to compile
their whole world with our shim. So dealing with all this complexity
for the sake of asthetics "looking" like the standard api seems kind
of not worth it. Sure it saves having to deprecate later and one last
session of shotgun surgery but that's kind of all we get.

Don't think i'm being biased in favor of windows/msvc here. From the
perspective of the windows/msvc combination i intend to use only the
standard C ABI provided by the implementation. I have no intention of
trying to introduce support for the current ABI that doesn't use the
standard atomic types. my discouraging of this approach is about avoiding
subtle to detect but very painful problems on {linux,unix}/compiler<version>
combinations that already have a shipped/stable ABI.

> > 
> > let's discuss this from here. if there's still overwhelming desire to
> > go
> > this route then we'll just do our best.
> > 
> > ty
> 
> I have a preference for exposing the stdatomic.h API. Tyler listed the disadvantages above. (I also have a preference for moving to C11 soon.)

I am eager to see this happen, but as explained in my original proposal
it doesn't eliminate the need for an abstraction. Unless we are willing
to break our compatibility promises and potentially take a performance
hit on some platform/compiler combinations which as i understand is not
acceptable.

> 
> Exposing a 1:1 similar API with RTE prefixes would also be acceptable for me. The disadvantage is that the names are different than the C11 names, which might lead to some confusion. And from an ABI stability perspective, such an important API should not be marked experimental. This means that years will pass before we can get rid of it again, due to ABI stability policies.

I think the key to success with rte_ prefixed names is making absolutely
sure we mirror the semantics and types in the standard.

I will point out one bit of fine print here is that we will not support
atomic operations on struct/union types (something the standard supports).
With the rte_ namespace i think this becomes less ambiguous, if we present
standard C names though what's to avoid the confusion? Aside from it fails
to compile with one compiler vs another.

I agree that this may be around for years. But how many years depends a
lot on how long we have to maintain compatibility for the existing
platform/compiler combinations that can't (and aren't enabled) to use
the standard.

Even if we introduced standard names we still have to undergo some kind
of mutant deprecation process to get the world to recompile everything
against the actual standard, so it doesn't give us forward
compatibility.

Let me know what folks would like to do, i guess i'm firmly leaned
toward no-shim and just rte_ explicit. But as a community i'll pursue
whatever you decide.

Thanks!

^ permalink raw reply	[relevance 4%]

* Re: [PATCH V8] ethdev: fix one address occupies two entries in MAC addrs
  2023-02-02 12:36  3% ` [PATCH V8] " Huisong Li
@ 2023-02-02 18:09  0%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-02 18:09 UTC (permalink / raw)
  To: Huisong Li, dev
  Cc: thomas, andrew.rybchenko, liudongdong3, huangdaode, fengchengwen

On 2/2/2023 12:36 PM, Huisong Li wrote:
> The dev->data->mac_addrs[0] will be changed to a new MAC address when
> applications modify the default MAC address by .mac_addr_set(). However,
> if the new default one has been added as a non-default MAC address by
> .mac_addr_add(), the .mac_addr_set() doesn't remove it from the mac_addrs
> list. As a result, one MAC address occupies two entries in the list. Like:
> add(MAC1)
> add(MAC2)
> add(MAC3)
> add(MAC4)
> set_default(MAC3)
> default=MAC3, the rest of the list=MAC1, MAC2, MAC3, MAC4
> Note: MAC3 occupies two entries.
> 
> In addition, some PMDs, such as i40e, ice, hns3 and so on, do remove the
> old default MAC when set default MAC. If user continues to do
> set_default(MAC5), and the mac_addrs list is default=MAC5, filters=(MAC1,
> MAC2, MAC3, MAC4). At this moment, user can still see MAC3 from the list,
> but packets with MAC3 aren't actually received by the PMD.
> 
> So need to ensure that the new default address is removed from the rest of
> the list if the address was already in the list.
> 

Same comment from past seems already valid, I am not looking to the set
for a while, sorry if this is already discussed and decided,
if not, I am referring to the side effect that setting MAC addresses
cause to remove MAC addresses, think following case:

add(MAC1) -> MAC1
add(MAC2) -> MAC1, MAC2
add(MAC3) -> MAC1, MAC2, MAC3
add(MAC4) -> MAC1, MAC2, MAC3, MAC4
set(MAC3) -> MAC3, MAC2, MAC4
set(MAC4) -> MAC4, MAC2
set(MAC2) -> MAC2

I am not exactly clear what is the intention with set(), if there is
single MAC I guess intention is to replace it with new one, but if there
are multiple MACs and one of them are already in the list intention may
be just to change the default MAC.

If above assumption is correct, what about following:

set(MAC) {
    if only_default_mac_exist
        replace_default_mac

    if MAC exists in list
	swap MAC and list[0]
    else
	replace_default_mac
}

This swap prevents removing MAC side affect, does it make sense?


> Fixes: 854d8ad4ef68 ("ethdev: add default mac address modifier")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> v8: fix some comments.
> v7: add announcement in the release notes and document this behavior.
> v6: fix commit log and some code comments.
> v5:
>  - merge the second patch into the first patch.
>  - add error log when rollback failed.
> v4:
>   - fix broken in the patchwork
> v3:
>   - first explicitly remove the non-default MAC, then set default one.
>   - document default and non-default MAC address
> v2:
>   - fixed commit log.
> ---
>  doc/guides/rel_notes/release_23_03.rst |  6 +++++
>  lib/ethdev/ethdev_driver.h             |  6 ++++-
>  lib/ethdev/rte_ethdev.c                | 35 ++++++++++++++++++++++++--
>  lib/ethdev/rte_ethdev.h                |  3 +++
>  4 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index 84b112a8b1..1c9b9912c2 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -105,6 +105,12 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =======================================================
>  
> +* ethdev: ensured all entries in MAC address list are uniques.
> +  When setting a default MAC address with the function
> +  ``rte_eth_dev_default_mac_addr_set``,
> +  the address is now removed from the rest of the address list
> +  in order to ensure it is only at index 0 of the list.
> +
>  
>  ABI Changes
>  -----------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index dde3ec84ef..3994c61b86 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -117,7 +117,11 @@ struct rte_eth_dev_data {
>  
>  	uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures */
>  
> -	/** Device Ethernet link address. @see rte_eth_dev_release_port() */
> +	/**
> +	 * Device Ethernet link addresses.
> +	 * All entries are unique.
> +	 * The first entry (index zero) is the default address.
> +	 */
>  	struct rte_ether_addr *mac_addrs;
>  	/** Bitmap associating MAC addresses to pools */
>  	uint64_t mac_pool_sel[RTE_ETH_NUM_RECEIVE_MAC_ADDR];
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 86ca303ab5..de25183619 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -4498,7 +4498,10 @@ rte_eth_dev_mac_addr_remove(uint16_t port_id, struct rte_ether_addr *addr)
>  int
>  rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
>  {
> +	uint64_t mac_pool_sel_bk = 0;
>  	struct rte_eth_dev *dev;
> +	uint32_t pool;
> +	int index;
>  	int ret;
>  
>  	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> @@ -4517,16 +4520,44 @@ rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
>  	if (*dev->dev_ops->mac_addr_set == NULL)
>  		return -ENOTSUP;
>  
> +	/* Keep address unique in dev->data->mac_addrs[]. */
> +	index = eth_dev_get_mac_addr_index(port_id, addr);
> +	if (index > 0) {
> +		/* Remove address in dev data structure */
> +		mac_pool_sel_bk = dev->data->mac_pool_sel[index];
> +		ret = rte_eth_dev_mac_addr_remove(port_id, addr);
> +		if (ret < 0) {
> +			RTE_ETHDEV_LOG(ERR, "Cannot remove the port %u address from the rest of list.\n",
> +				       port_id);
> +			return ret;
> +		}
> +	}
>  	ret = (*dev->dev_ops->mac_addr_set)(dev, addr);
>  	if (ret < 0)
> -		return ret;
> +		goto out;
>  
>  	/* Update default address in NIC data structure */
>  	rte_ether_addr_copy(addr, &dev->data->mac_addrs[0]);
>  
>  	return 0;
> -}
>  
> +out:
> +	if (index > 0) {
> +		pool = 0;
> +		do {
> +			if (mac_pool_sel_bk & UINT64_C(1)) {
> +				if (rte_eth_dev_mac_addr_add(port_id, addr,
> +							     pool) != 0)
> +					RTE_ETHDEV_LOG(ERR, "failed to restore MAC pool id(%u) in port %u.\n",
> +						       pool, port_id);
> +			}
> +			mac_pool_sel_bk >>= 1;
> +			pool++;
> +		} while (mac_pool_sel_bk != 0);
> +	}
> +
> +	return ret;
> +}
>  
>  /*
>   * Returns index into MAC address array of addr. Use 00:00:00:00:00:00 to find
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d22de196db..2456153457 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4356,6 +4356,9 @@ int rte_eth_dev_mac_addr_remove(uint16_t port_id,
>  
>  /**
>   * Set the default MAC address.
> + * It replaces the address at index 0 of the MAC address list.
> + * If the address was already in the MAC address list,
> + * it is removed from the rest of the list.
>   *
>   * @param port_id
>   *   The port identifier of the Ethernet device.


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-30 14:43  0%                   ` Jerin Jacob
@ 2023-02-02 16:12  0%                     ` Naga Harish K, S V
  2023-02-03  9:44  0%                       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-02-02 16:12 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, January 30, 2023 8:13 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Saturday, January 28, 2023 4:24 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > >
> > > > > > > >
> > > > > > > > > > +        */
> > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > >
> > > > > > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > to
> > > > > make
> > > > > > > > > sure rsvd is zero.
> > > > > > > > >
> > > > > > > >
> > > > > > > > The reserved fields are not used by the adapter or application.
> > > > > > > > Not sure Is it necessary to Introduce a new API to clear
> > > > > > > > reserved
> > > fields.
> > > > > > >
> > > > > > > When adapter starts using new fileds(when we add new fieds
> > > > > > > in future), the old applicaiton which is not using
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> junk
> > > > > > > value and then adapter implementation will behave bad.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > does it mean, the application doesn't re-compile for the new DPDK?
> > > > >
> > > > > Yes. No need recompile if ABI not breaking.
> > > > >
> > > > > > When some of the reserved fields are used in the future, the
> > > > > > application
> > > > > also may need to be recompiled along with DPDK right?
> > > > > > As the application also may need to use the newly consumed
> > > > > > reserved
> > > > > fields?
> > > > >
> > > > > The problematic case is:
> > > > >
> > > > > Adapter implementation of 23.07(Assuming there is change params)
> > > > > field needs to work with application of 23.03.
> > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > >
> > > >
> > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes only
> > > reserved fields to zero,  it may not solve the issue in this case.
> > >
> > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > fields, not just reserved field.
> > > The application calling sequence  is
> > >
> > > struct my_config c;
> > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > c.interseted_filed_to_be_updated = val;
> > >
> > Can it be done like
> >         struct my_config c = {0};
> >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > comments to recommend above usage to reset all fields?
> > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> avoided.
> 
> Better to have a function for documentation clarity. Similar scheme already
> there in DPDK. See rte_eth_cman_config_init()
> 
> 


The reference function rte_eth_cman_config_init() is resetting the params struct and initializing the required params with default values in the pmd cb.
The proposed rte_event_eth_rx_adapter_runtime_params_init () API just needs to reset the params struct. There are no pmd CBs involved.
Having an API just to reset the struct seems overkill. What do you think?

> >
> > > Let me share an example and you can tell where is the issue
> > >
> > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > 3)There is an application written based on 22.03 which using only 8B
> > > after calling rte_event_eth_rx_adapter_runtime_params_init()
> > > 4)Assume, in 22.07 another 8B added to structure.
> > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > application is calling
> > > rte_event_eth_rx_adapter_runtime_params_init()
> > > and 9 to 15B are zero, the implementation will not go bad.
> > >
> > > > The old application only tries to set/get previous valid fields
> > > > and the newly
> > > used fields may still contain junk value.
> > > > If the application wants to make use of any the newly used params,
> > > > the
> > > application changes are required anyway.
> > >
> > > Yes. If application wants to make use of newly added features. No
> > > need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-02  9:28  0%         ` Andrew Rybchenko
@ 2023-02-02 14:43  0%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-02-02 14:43 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Andrew Rybchenko
  Cc: Slava Ovsiienko, Ori Kam, Aman Singh, Yuying Zhang, Ferruh Yigit,
	dev, Raslan Darawsheh

02/02/2023 10:28, Andrew Rybchenko:
> On 2/1/23 18:50, Jiawei(Jonny) Wang wrote:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> On 1/30/23 20:00, Jiawei Wang wrote:
> >>> Adds the new tx_phy_affinity field into the padding hole of
> >>> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> >>> Adds a suppress type for structure change in the ABI check file.
> >>>
> >>> This patch adds the testpmd command line:
> >>> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >>>
> >>> For example, there're two hardware ports 0 and 1 connected to a single
> >>> DPDK port (port id 0), and phy_affinity 1 stood for hardware port 0
> >>> and phy_affinity 2 stood for hardware port 1, used the below command
> >>> to config tx phy affinity for per Tx Queue:
> >>>           port config 0 txq 0 phy_affinity 1
> >>>           port config 0 txq 1 phy_affinity 1
> >>>           port config 0 txq 2 phy_affinity 2
> >>>           port config 0 txq 3 phy_affinity 2
> >>>
> >>> These commands config the TxQ index 0 and TxQ index 1 with phy
> >>> affinity 1, uses TxQ 0 or TxQ 1 send packets, these packets will be
> >>> sent from the hardware port 0, and similar with hardware port 1 if
> >>> sending packets with TxQ 2 or TxQ 3.
> >>
> >> Frankly speaking I dislike it. Why do we need to expose it on generic ethdev
> >> layer? IMHO dynamic mbuf field would be a better solution to control Tx
> >> routing to a specific PHY port.

The design of this patch is to map a queue of the front device
with an underlying port.
This design may be applicable to several situations,
including DPDK bonding PMD, or Linux bonding connected to a PMD.

The default 0, meaning the queue is not mapped to anything (no change).
If the affinity is higher than 0, then the queue can be configured as desired.
Then if an application wants to send a packet to a specific underlying port,
it just has to send to the right queue.

Functionnaly, mapping the queue, or setting the port in mbuf (your proposal)
are the same.
The advantages of the queue mapping are:
	- faster to use a queue than filling mbuf field
	- optimization can be done at queue setup

[...]
> Why are these queues should be visible to DPDK application?
> Nobody denies you to create many HW queues behind one ethdev
> queue. Of course, there questions related to descriptor status
> API in this case, but IMHO it would be better than exposing
> these details to an application level.

Why not mapping the queues if application requires these details?

> >> IMHO, we definitely need dev_info information about a number of physical
> >> ports behind.

Yes dev_info would be needed.

> >> Advertising value greater than 0 should mean that PMD supports
> >> corresponding mbuf dynamic field to contol ongoing physical port on Tx (or
> >> should just reject packets on prepare which try to specify outgoing phy port
> >> otherwise). In the same way the information may be provided on Rx.
> > 
> > See above, I think phy affinity is Queue level not for each packet.
> > 
> >> I'm OK to have 0 as no phy affinity value and greater than zero as specified phy
> >> affinity. I.e. no dynamic flag is required.
> > 
> > Thanks for agreement.
> > 
> >> Also I think that order of patches should be different.
> >> We should start from a patch which provides dev_info and flow API matching
> >> and action should be in later patch.
> > 
> > OK.




^ permalink raw reply	[relevance 0%]

* [PATCH V8] ethdev: fix one address occupies two entries in MAC addrs
    2023-02-01 13:15  3% ` [PATCH V7] ethdev: fix one address occupies two entries " Huisong Li
@ 2023-02-02 12:36  3% ` Huisong Li
  2023-02-02 18:09  0%   ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Huisong Li @ 2023-02-02 12:36 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, andrew.rybchenko, liudongdong3, huangdaode,
	fengchengwen, lihuisong

The dev->data->mac_addrs[0] will be changed to a new MAC address when
applications modify the default MAC address by .mac_addr_set(). However,
if the new default one has been added as a non-default MAC address by
.mac_addr_add(), the .mac_addr_set() doesn't remove it from the mac_addrs
list. As a result, one MAC address occupies two entries in the list. Like:
add(MAC1)
add(MAC2)
add(MAC3)
add(MAC4)
set_default(MAC3)
default=MAC3, the rest of the list=MAC1, MAC2, MAC3, MAC4
Note: MAC3 occupies two entries.

In addition, some PMDs, such as i40e, ice, hns3 and so on, do remove the
old default MAC when set default MAC. If user continues to do
set_default(MAC5), and the mac_addrs list is default=MAC5, filters=(MAC1,
MAC2, MAC3, MAC4). At this moment, user can still see MAC3 from the list,
but packets with MAC3 aren't actually received by the PMD.

So need to ensure that the new default address is removed from the rest of
the list if the address was already in the list.

Fixes: 854d8ad4ef68 ("ethdev: add default mac address modifier")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v8: fix some comments.
v7: add announcement in the release notes and document this behavior.
v6: fix commit log and some code comments.
v5:
 - merge the second patch into the first patch.
 - add error log when rollback failed.
v4:
  - fix broken in the patchwork
v3:
  - first explicitly remove the non-default MAC, then set default one.
  - document default and non-default MAC address
v2:
  - fixed commit log.
---
 doc/guides/rel_notes/release_23_03.rst |  6 +++++
 lib/ethdev/ethdev_driver.h             |  6 ++++-
 lib/ethdev/rte_ethdev.c                | 35 ++++++++++++++++++++++++--
 lib/ethdev/rte_ethdev.h                |  3 +++
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 84b112a8b1..1c9b9912c2 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -105,6 +105,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* ethdev: ensured all entries in MAC address list are uniques.
+  When setting a default MAC address with the function
+  ``rte_eth_dev_default_mac_addr_set``,
+  the address is now removed from the rest of the address list
+  in order to ensure it is only at index 0 of the list.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index dde3ec84ef..3994c61b86 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -117,7 +117,11 @@ struct rte_eth_dev_data {
 
 	uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures */
 
-	/** Device Ethernet link address. @see rte_eth_dev_release_port() */
+	/**
+	 * Device Ethernet link addresses.
+	 * All entries are unique.
+	 * The first entry (index zero) is the default address.
+	 */
 	struct rte_ether_addr *mac_addrs;
 	/** Bitmap associating MAC addresses to pools */
 	uint64_t mac_pool_sel[RTE_ETH_NUM_RECEIVE_MAC_ADDR];
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 86ca303ab5..de25183619 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -4498,7 +4498,10 @@ rte_eth_dev_mac_addr_remove(uint16_t port_id, struct rte_ether_addr *addr)
 int
 rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
 {
+	uint64_t mac_pool_sel_bk = 0;
 	struct rte_eth_dev *dev;
+	uint32_t pool;
+	int index;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -4517,16 +4520,44 @@ rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
 	if (*dev->dev_ops->mac_addr_set == NULL)
 		return -ENOTSUP;
 
+	/* Keep address unique in dev->data->mac_addrs[]. */
+	index = eth_dev_get_mac_addr_index(port_id, addr);
+	if (index > 0) {
+		/* Remove address in dev data structure */
+		mac_pool_sel_bk = dev->data->mac_pool_sel[index];
+		ret = rte_eth_dev_mac_addr_remove(port_id, addr);
+		if (ret < 0) {
+			RTE_ETHDEV_LOG(ERR, "Cannot remove the port %u address from the rest of list.\n",
+				       port_id);
+			return ret;
+		}
+	}
 	ret = (*dev->dev_ops->mac_addr_set)(dev, addr);
 	if (ret < 0)
-		return ret;
+		goto out;
 
 	/* Update default address in NIC data structure */
 	rte_ether_addr_copy(addr, &dev->data->mac_addrs[0]);
 
 	return 0;
-}
 
+out:
+	if (index > 0) {
+		pool = 0;
+		do {
+			if (mac_pool_sel_bk & UINT64_C(1)) {
+				if (rte_eth_dev_mac_addr_add(port_id, addr,
+							     pool) != 0)
+					RTE_ETHDEV_LOG(ERR, "failed to restore MAC pool id(%u) in port %u.\n",
+						       pool, port_id);
+			}
+			mac_pool_sel_bk >>= 1;
+			pool++;
+		} while (mac_pool_sel_bk != 0);
+	}
+
+	return ret;
+}
 
 /*
  * Returns index into MAC address array of addr. Use 00:00:00:00:00:00 to find
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index d22de196db..2456153457 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4356,6 +4356,9 @@ int rte_eth_dev_mac_addr_remove(uint16_t port_id,
 
 /**
  * Set the default MAC address.
+ * It replaces the address at index 0 of the MAC address list.
+ * If the address was already in the MAC address list,
+ * it is removed from the rest of the list.
  *
  * @param port_id
  *   The port identifier of the Ethernet device.
-- 
2.22.0


^ permalink raw reply	[relevance 3%]

* [PATCH v6 1/3] ethdev: add IPv6 routing extension header definition
  @ 2023-02-02 10:00  3%   ` Rongwei Liu
  0 siblings, 0 replies; 200+ results
From: Rongwei Liu @ 2023-02-02 10:00 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Andrew Rybchenko, Aman Singh, Yuying Zhang,
	Ferruh Yigit, Olivier Matz

Add IPv6 routing extension header definition and no
TLV support for now.

At rte_flow layer, there are new items defined for matching
type/nexthdr/segments_left field.

Add command line support for IPv6 routing extension header
matching: type/nexthdr/segment_list.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
 doc/guides/prog_guide/rte_flow.rst     |  9 +++++
 doc/guides/rel_notes/release_23_03.rst |  9 +++++
 lib/ethdev/rte_flow.c                  |  1 +
 lib/ethdev/rte_flow.h                  | 19 +++++++++++
 lib/net/rte_ip.h                       | 20 +++++++++++
 6 files changed, 104 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 88108498e0..7a8516829c 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -298,6 +298,10 @@ enum index {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_ICMP,
 	ITEM_ICMP_TYPE,
 	ITEM_ICMP_CODE,
@@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
 	ITEM_ARP_ETH_IPV4,
 	ITEM_IPV6_EXT,
 	ITEM_IPV6_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
 	ITEM_ICMP6,
 	ITEM_ICMP6_ND_NS,
 	ITEM_ICMP6_ND_NA,
@@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_NEXT,
+	ZERO,
+};
+
+static const enum index item_ipv6_routing_ext[] = {
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_NEXT,
 	ZERO,
 };
@@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
 		.args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
 					   has_frag_ext, 1)),
 	},
+	[ITEM_IPV6_ROUTING_EXT] = {
+		.name = "ipv6_routing_ext",
+		.help = "match IPv6 routing extension header",
+		.priv = PRIV_ITEM(IPV6_ROUTING_EXT,
+				  sizeof(struct rte_flow_item_ipv6_routing_ext)),
+		.next = NEXT(item_ipv6_routing_ext),
+		.call = parse_vc,
+	},
+	[ITEM_IPV6_ROUTING_EXT_TYPE] = {
+		.name = "ext_type",
+		.help = "match IPv6 routing extension header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.type)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
+		.name = "ext_next_hdr",
+		.help = "match IPv6 routing extension header next header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.next_hdr)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
+		.name = "ext_seg_left",
+		.help = "match IPv6 routing extension header segment left",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.segments_left)),
+	},
 	[ITEM_ICMP] = {
 		.name = "icmp",
 		.help = "match ICMP header",
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 3e6242803d..602fab29d3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
 
 - ``color``: Metering color marker.
 
+Item: ``IPV6_ROUTING_EXT``
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Matches IPv6 routing extension header.
+
+- ``next_hdr``: Next layer header type.
+- ``type``: IPv6 routing extension header type.
+- ``segments_left``: How many IPv6 destination addresses carries on.
+
 Actions
 ~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index c15f6fbb9f..1337da73b8 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -69,6 +69,11 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added rte_flow support for matching IPv6 routing extension header fields.**
+
+  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing extension
+  header.
+
 
 Removed Items
 -------------
@@ -98,6 +103,10 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* net: added a new structure:
+
+    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 7d0c24366c..4da581146e 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -157,6 +157,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
 	MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
 	MK_FLOW_ITEM(METER_COLOR, sizeof(struct rte_flow_item_meter_color)),
+	MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..9b9018cba2 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -624,6 +624,13 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_meter_color.
 	 */
 	RTE_FLOW_ITEM_TYPE_METER_COLOR,
+
+	/**
+	 * Matches the presence of IPv6 routing extension header.
+	 *
+	 * @see struct rte_flow_item_ipv6_routing_ext.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
 };
 
 /**
@@ -873,6 +880,18 @@ struct rte_flow_item_ipv6 {
 	uint32_t reserved:23;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
+ *
+ * Matches an IPv6 routing extension header.
+ */
+struct rte_flow_item_ipv6_routing_ext {
+	struct rte_ipv6_routing_ext hdr;
+};
+
 /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
 #ifndef __cplusplus
 static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 9c8e8206f0..a310e9d498 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -539,6 +539,26 @@ struct rte_ipv6_hdr {
 	uint8_t  dst_addr[16];	/**< IP address of destination host(s). */
 } __rte_packed;
 
+/**
+ * IPv6 Routing Extension Header
+ */
+struct rte_ipv6_routing_ext {
+	uint8_t next_hdr;			/**< Protocol, next header. */
+	uint8_t hdr_len;			/**< Header length. */
+	uint8_t type;				/**< Extension header type. */
+	uint8_t segments_left;			/**< Valid segments number. */
+	__extension__
+	union {
+		rte_be32_t flags;		/**< Packet control data per type. */
+		struct {
+			uint8_t last_entry;	/**< The last_entry field of SRH */
+			uint8_t flag;		/**< Packet flag. */
+			rte_be16_t tag;		/**< Packet tag. */
+		};
+	};
+	/* Next are 128-bit IPv6 address fields to describe segments. */
+} __rte_packed;
+
 /* IPv6 vtc_flow: IPv / TC / flow_label */
 #define RTE_IPV6_HDR_FL_SHIFT 0
 #define RTE_IPV6_HDR_TC_SHIFT 20
-- 
2.27.0


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-01 15:50  0%       ` Jiawei(Jonny) Wang
@ 2023-02-02  9:28  0%         ` Andrew Rybchenko
  2023-02-02 14:43  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2023-02-02  9:28 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh

On 2/1/23 18:50, Jiawei(Jonny) Wang wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Subject: Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue
>> API
>>
>> On 1/30/23 20:00, Jiawei Wang wrote:
>>> Adds the new tx_phy_affinity field into the padding hole of
>>> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
>>> Adds a suppress type for structure change in the ABI check file.
>>>
>>> This patch adds the testpmd command line:
>>> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
>>>
>>> For example, there're two hardware ports 0 and 1 connected to a single
>>> DPDK port (port id 0), and phy_affinity 1 stood for hardware port 0
>>> and phy_affinity 2 stood for hardware port 1, used the below command
>>> to config tx phy affinity for per Tx Queue:
>>>           port config 0 txq 0 phy_affinity 1
>>>           port config 0 txq 1 phy_affinity 1
>>>           port config 0 txq 2 phy_affinity 2
>>>           port config 0 txq 3 phy_affinity 2
>>>
>>> These commands config the TxQ index 0 and TxQ index 1 with phy
>>> affinity 1, uses TxQ 0 or TxQ 1 send packets, these packets will be
>>> sent from the hardware port 0, and similar with hardware port 1 if
>>> sending packets with TxQ 2 or TxQ 3.
>>
>> Frankly speaking I dislike it. Why do we need to expose it on generic ethdev
>> layer? IMHO dynamic mbuf field would be a better solution to control Tx
>> routing to a specific PHY port.
>>
> 
> OK, the phy affinity is not part of packet information(like timestamp).

Why? port_id is a packet information. Why phy_subport_id is not
a packet information.

> And second, the phy affinity is Queue layer, that is, the phy affinity value
> should keep the same behavior per Queue.
> After the TxQ was created, the packets should be sent the same physical port
> If using the same TxQ index.

Why are these queues should be visible to DPDK application?
Nobody denies you to create many HW queues behind one ethdev
queue. Of course, there questions related to descriptor status
API in this case, but IMHO it would be better than exposing
these details to an application level.

> 
>> IMHO, we definitely need dev_info information about a number of physical
>> ports behind. Advertising value greater than 0 should mean that PMD supports
>> corresponding mbuf dynamic field to contol ongoing physical port on Tx (or
>> should just reject packets on prepare which try to specify outgoing phy port
>> otherwise). In the same way the information may be provided on Rx.
>>
> 
> See above, I think phy affinity is Queue level not for each packet.
> 
>> I'm OK to have 0 as no phy affinity value and greater than zero as specified phy
>> affinity. I.e. no dynamic flag is required.
>>
> 
> Thanks for agreement.
> 
>> Also I think that order of patches should be different.
>> We should start from a patch which provides dev_info and flow API matching
>> and action should be in later patch.
>>
> 
> OK.
>   
>>>
>>> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
>>
>> [snip]
> 


^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-01 21:41  3%       ` Tyler Retzlaff
@ 2023-02-02  8:43  4%         ` Morten Brørup
  2023-02-02 19:00  4%           ` Tyler Retzlaff
  2023-02-07 23:34  0%         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-02  8:43 UTC (permalink / raw)
  To: Tyler Retzlaff, Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 1 February 2023 22.41
> 
> On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> >
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Tuesday, January 31, 2023 4:42 PM
> > >
> > > Honnappa, please could you give your view on the future of atomics
> in DPDK?
> > Thanks Thomas, apologies it has taken me a while to get to this
> discussion.
> >
> > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> (stdatomics as is called here) already serve the purpose. These APIs
> are well understood and documented.
> 
> i agree that whatever atomics APIs we advocate for should align with
> the
> standard C atomics for the reasons you state including implied
> semantics.
> 
> >
> > For environments where stdatomics are not supported, we could have a
> stdatomic.h in DPDK implementing the same APIs (we have to support only
> _explicit APIs). This allows the code to use stdatomics APIs and when
> we move to minimum supported standard C11, we just need to get rid of
> the file in DPDK repo.

Perhaps we can use something already existing, such as this:
https://android.googlesource.com/platform/bionic/+/lollipop-release/libc/include/stdatomic.h

> 
> my concern with this is that if we provide a stdatomic.h or introduce
> names
> from stdatomic.h it's a violation of the C standard.
> 
> references:
>  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
>  * GNU libc manual
>    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> Names.html
> 
> in effect the header, the names and in some instances namespaces
> introduced
> are reserved by the implementation. there are several reasons in the
> GNU libc
> manual that explain the justification for these reservations and if
> if we think about ODR and ABI compatibility we can conceive of others.

I we are going to move to C11 soon, I consider the shim interim, and am inclined to ignore these warning factors.

If we are not moving to C11 soon, I would consider these disadvantages more seriously.

> 
> i'll also remark that the inter-mingling of names from the POSIX
> standard implicitly exposed as a part of the EAL public API has been
> problematic for portability.

This is a very important remark, which should be considered carefully! Tyler has firsthand experience with DPDK portability. If he thinks porting to Windows is going to be a headache if we expose the stdatomic.h API, we must listen! So, what is your gut feeling here, Tyler?

> 
> let's discuss this from here. if there's still overwhelming desire to
> go
> this route then we'll just do our best.
> 
> ty

I have a preference for exposing the stdatomic.h API. Tyler listed the disadvantages above. (I also have a preference for moving to C11 soon.)

Exposing a 1:1 similar API with RTE prefixes would also be acceptable for me. The disadvantage is that the names are different than the C11 names, which might lead to some confusion. And from an ABI stability perspective, such an important API should not be marked experimental. This means that years will pass before we can get rid of it again, due to ABI stability policies.

-Morten


^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eal: introduce atomics abstraction
  @ 2023-02-01 21:41  3%       ` Tyler Retzlaff
  2023-02-02  8:43  4%         ` Morten Brørup
  2023-02-07 23:34  0%         ` Honnappa Nagarahalli
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-01 21:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, mb, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, January 31, 2023 4:42 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: dev@dpdk.org; bruce.richardson@intel.com; mb@smartsharesystems.com;
> > Tyler Retzlaff <roretzla@linux.microsoft.com>; david.marchand@redhat.com;
> > jerinj@marvell.com; konstantin.ananyev@huawei.com; ferruh.yigit@amd.com
> > Subject: Re: [PATCH] eal: introduce atomics abstraction
> > 
> > Honnappa, please could you give your view on the future of atomics in DPDK?
> Thanks Thomas, apologies it has taken me a while to get to this discussion.
> 
> IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h (stdatomics as is called here) already serve the purpose. These APIs are well understood and documented.

i agree that whatever atomics APIs we advocate for should align with the
standard C atomics for the reasons you state including implied semantics.

> 
> For environments where stdatomics are not supported, we could have a stdatomic.h in DPDK implementing the same APIs (we have to support only _explicit APIs). This allows the code to use stdatomics APIs and when we move to minimum supported standard C11, we just need to get rid of the file in DPDK repo.

my concern with this is that if we provide a stdatomic.h or introduce names
from stdatomic.h it's a violation of the C standard.

references:
 * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
 * GNU libc manual
   https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

in effect the header, the names and in some instances namespaces introduced
are reserved by the implementation. there are several reasons in the GNU libc
manual that explain the justification for these reservations and if
if we think about ODR and ABI compatibility we can conceive of others.

i'll also remark that the inter-mingling of names from the POSIX
standard implicitly exposed as a part of the EAL public API has been
problematic for portability.

let's discuss this from here. if there's still overwhelming desire to go
this route then we'll just do our best.

ty

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-01  9:05  0%     ` Andrew Rybchenko
@ 2023-02-01 15:50  0%       ` Jiawei(Jonny) Wang
  2023-02-02  9:28  0%         ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-01 15:50 UTC (permalink / raw)
  To: Andrew Rybchenko, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh


Hi,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> On 1/30/23 20:00, Jiawei Wang wrote:
> > For the multiple hardware ports connect to a single DPDK port
> > (mhpsdp), the previous patch introduces the new rte flow item to match
> > the phy affinity of the received packets.
> >
> > This patch adds the tx_phy_affinity setting in Tx queue API, the
> > affinity
> 
> "This patch adds" -> "Add ..."
> 
OK,  will change to 'Add the tx_phy_affinity...."

> > value reflects packets be sent to which hardware port.
> > Value 0 is no affinity and traffic will be routed between different
> > physical ports,
> 
> Who will it be routed?
> 

Assume there's two slave physical port bonded and DPDK attached the bond master bond,
The packets can be sent from first physical port or second physical port, it depends on the PMD
Driver and low level 'routing' selection.

> > if 0 is disabled then try to match on phy_affinity 0 will result in an
> > error.
> 
> Why are you talking about matching here?
> 

Previous patch we mentioned the same phy affinity can be used to handled the packet on same hardware
Port, so if 0 is no affinity then match it should report error.

> >
> > Adds the new tx_phy_affinity field into the padding hole of
> > rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> > Adds a suppress type for structure change in the ABI check file.
> >
> > This patch adds the testpmd command line:
> > testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >
> > For example, there're two hardware ports 0 and 1 connected to a single
> > DPDK port (port id 0), and phy_affinity 1 stood for hardware port 0
> > and phy_affinity 2 stood for hardware port 1, used the below command
> > to config tx phy affinity for per Tx Queue:
> >          port config 0 txq 0 phy_affinity 1
> >          port config 0 txq 1 phy_affinity 1
> >          port config 0 txq 2 phy_affinity 2
> >          port config 0 txq 3 phy_affinity 2
> >
> > These commands config the TxQ index 0 and TxQ index 1 with phy
> > affinity 1, uses TxQ 0 or TxQ 1 send packets, these packets will be
> > sent from the hardware port 0, and similar with hardware port 1 if
> > sending packets with TxQ 2 or TxQ 3.
> 
> Frankly speaking I dislike it. Why do we need to expose it on generic ethdev
> layer? IMHO dynamic mbuf field would be a better solution to control Tx
> routing to a specific PHY port.
> 

OK, the phy affinity is not part of packet information(like timestamp).
And second, the phy affinity is Queue layer, that is, the phy affinity value 
should keep the same behavior per Queue. 
After the TxQ was created, the packets should be sent the same physical port
If using the same TxQ index.  

> IMHO, we definitely need dev_info information about a number of physical
> ports behind. Advertising value greater than 0 should mean that PMD supports
> corresponding mbuf dynamic field to contol ongoing physical port on Tx (or
> should just reject packets on prepare which try to specify outgoing phy port
> otherwise). In the same way the information may be provided on Rx.
> 

See above, I think phy affinity is Queue level not for each packet.

> I'm OK to have 0 as no phy affinity value and greater than zero as specified phy
> affinity. I.e. no dynamic flag is required.
> 

Thanks for agreement.

> Also I think that order of patches should be different.
> We should start from a patch which provides dev_info and flow API matching
> and action should be in later patch.
>

OK.
 
> >
> > Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> 
> [snip]


^ permalink raw reply	[relevance 0%]

* RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro conflict
  2023-02-01 13:29  0%       ` Michael Baum
@ 2023-02-01 14:02  0%         ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2023-02-01 14:02 UTC (permalink / raw)
  To: Michael Baum, dev
  Cc: Matan Azrad, Ashish Gupta, Fan Zhang, Kai Ji,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	fiona.trahe, stable

> > Hi,
> > > > >
> > > > > After this change, I'm not sure about the purpose of
> > > > > "RTE_COMP_ALGO_LIST_END".
> > > > > There is no any other use of it in DPDK code, and it isn't
> > > > > represent the number of algorithms supported by the API since the
> > > > > "RTE_COMP_ALGO_UNSPECIFIED" is part of the enum.
> > > > >
> > > > > Due to the compress API is experimental I think the
> > > > > "RTE_COMP_ALGO_LIST_END" can be removed.
> > > > >
> > > > +1 to remove the list end enums. This will also help in avoiding ABI
> > > > +breakage
> > > > When we make this lib as stable.
> > >
> > > Even RTE_COMP_HASH_ALGO_LIST_END can also be removed.
> > > It is not used anywhere.
> >
> > Can you send a patch to remove these list end enums along with this patch?
> 
> In the same patch? Or add one?
Separate patch would be better as the current patch is talking about a conflict.
Removing the enums need not be backported, but this patch is required to be backported.

^ permalink raw reply	[relevance 0%]

* RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro conflict
  2023-02-01 13:19  0%     ` Akhil Goyal
@ 2023-02-01 13:29  0%       ` Michael Baum
  2023-02-01 14:02  0%         ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Michael Baum @ 2023-02-01 13:29 UTC (permalink / raw)
  To: Akhil Goyal, dev
  Cc: Matan Azrad, Ashish Gupta, Fan Zhang, Kai Ji,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	fiona.trahe, stable



Hi,
> 
> 
> Hi,
> > > >
> > > > After this change, I'm not sure about the purpose of
> > > > "RTE_COMP_ALGO_LIST_END".
> > > > There is no any other use of it in DPDK code, and it isn't
> > > > represent the number of algorithms supported by the API since the
> > > > "RTE_COMP_ALGO_UNSPECIFIED" is part of the enum.
> > > >
> > > > Due to the compress API is experimental I think the
> > > > "RTE_COMP_ALGO_LIST_END" can be removed.
> > > >
> > > +1 to remove the list end enums. This will also help in avoiding ABI
> > > +breakage
> > > When we make this lib as stable.
> >
> > Even RTE_COMP_HASH_ALGO_LIST_END can also be removed.
> > It is not used anywhere.
> 
> Can you send a patch to remove these list end enums along with this patch?

In the same patch? Or add one?
> 
> -Akhil

^ permalink raw reply	[relevance 0%]

* RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro conflict
  2023-01-31  8:23  0%   ` Akhil Goyal
@ 2023-02-01 13:19  0%     ` Akhil Goyal
  2023-02-01 13:29  0%       ` Michael Baum
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2023-02-01 13:19 UTC (permalink / raw)
  To: Akhil Goyal, Michael Baum, dev
  Cc: Matan Azrad, Ashish Gupta, Fan Zhang, Kai Ji, Thomas Monjalon,
	fiona.trahe, stable

Hi,
> > >
> > > After this change, I'm not sure about the purpose of
> > > "RTE_COMP_ALGO_LIST_END".
> > > There is no any other use of it in DPDK code, and it isn't represent the
> > > number of algorithms supported by the API since the
> > > "RTE_COMP_ALGO_UNSPECIFIED" is part of the enum.
> > >
> > > Due to the compress API is experimental I think the
> > > "RTE_COMP_ALGO_LIST_END" can be removed.
> > >
> > +1 to remove the list end enums. This will also help in avoiding ABI breakage
> > When we make this lib as stable.
> 
> Even RTE_COMP_HASH_ALGO_LIST_END can also be removed.
> It is not used anywhere.

Can you send a patch to remove these list end enums along with this patch?

-Akhil

^ permalink raw reply	[relevance 0%]

* [PATCH V7] ethdev: fix one address occupies two entries in MAC addrs
  @ 2023-02-01 13:15  3% ` Huisong Li
  2023-02-02 12:36  3% ` [PATCH V8] " Huisong Li
  1 sibling, 0 replies; 200+ results
From: Huisong Li @ 2023-02-01 13:15 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, andrew.rybchenko, liudongdong3, huangdaode,
	fengchengwen, lihuisong

The dev->data->mac_addrs[0] will be changed to a new MAC address when
applications modify the default MAC address by .mac_addr_set(). However,
if the new default one has been added as a non-default MAC address by
.mac_addr_add(), the .mac_addr_set() doesn't remove it from the mac_addrs
list. As a result, one MAC address occupies two entries in the list. Like:
add(MAC1)
add(MAC2)
add(MAC3)
add(MAC4)
set_default(MAC3)
default=MAC3, the rest of list=MAC1, MAC2, MAC3, MAC4
Note: MAC3 occupies two entries.

In addition, some PMDs, such as i40e, ice, hns3 and so on, do remove the
old default MAC when set default MAC. If user continues to do
set_default(MAC5), and the mac_addrs list is default=MAC5, filters=(MAC1,
MAC2, MAC3, MAC4). At this moment, user can still see MAC3 from the list,
but packets with MAC3 aren't actually received by the PMD.

So need to ensure that the new default address is removed from the rest of
the list if the address was already in the list.

Fixes: 854d8ad4ef68 ("ethdev: add default mac address modifier")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
v7: add announcement in the release notes and document this behavior.
v6: fix commit log and some code comments.
v5:
 - merge the second patch into the first patch.
 - add error log when rollback failed.
v4:
  - fix broken in the patchwork
v3:
  - first explicitly remove the non-default MAC, then set default one.
  - document default and non-default MAC address
v2:
  - fixed commit log.

---
 doc/guides/rel_notes/release_23_03.rst |  6 +++++
 lib/ethdev/ethdev_driver.h             |  6 ++++-
 lib/ethdev/rte_ethdev.c                | 35 ++++++++++++++++++++++++--
 lib/ethdev/rte_ethdev.h                |  3 +++
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 84b112a8b1..f63ec2b399 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -101,10 +101,16 @@ API Changes
      Use fixed width quotes for ``function_names`` or ``struct_names``.
      Use the past tense.
 
+
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+   * ethdev: ensured all entries in MAC address list is unique.
+     The function ``rte_eth_dev_default_mac_addr_set`` replaces the address
+     at index 0 of the address list. If the address was already in the
+     address list, it is removed from the rest of the list.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index dde3ec84ef..14a1d9adad 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -117,7 +117,11 @@ struct rte_eth_dev_data {
 
 	uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures */
 
-	/** Device Ethernet link address. @see rte_eth_dev_release_port() */
+	/**
+	 * Device Ethernet link addresses.
+	 * All entries are unique. The first entry (index zero) is the
+	 * default address.
+	 */
 	struct rte_ether_addr *mac_addrs;
 	/** Bitmap associating MAC addresses to pools */
 	uint64_t mac_pool_sel[RTE_ETH_NUM_RECEIVE_MAC_ADDR];
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 86ca303ab5..de25183619 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -4498,7 +4498,10 @@ rte_eth_dev_mac_addr_remove(uint16_t port_id, struct rte_ether_addr *addr)
 int
 rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
 {
+	uint64_t mac_pool_sel_bk = 0;
 	struct rte_eth_dev *dev;
+	uint32_t pool;
+	int index;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -4517,16 +4520,44 @@ rte_eth_dev_default_mac_addr_set(uint16_t port_id, struct rte_ether_addr *addr)
 	if (*dev->dev_ops->mac_addr_set == NULL)
 		return -ENOTSUP;
 
+	/* Keep address unique in dev->data->mac_addrs[]. */
+	index = eth_dev_get_mac_addr_index(port_id, addr);
+	if (index > 0) {
+		/* Remove address in dev data structure */
+		mac_pool_sel_bk = dev->data->mac_pool_sel[index];
+		ret = rte_eth_dev_mac_addr_remove(port_id, addr);
+		if (ret < 0) {
+			RTE_ETHDEV_LOG(ERR, "Cannot remove the port %u address from the rest of list.\n",
+				       port_id);
+			return ret;
+		}
+	}
 	ret = (*dev->dev_ops->mac_addr_set)(dev, addr);
 	if (ret < 0)
-		return ret;
+		goto out;
 
 	/* Update default address in NIC data structure */
 	rte_ether_addr_copy(addr, &dev->data->mac_addrs[0]);
 
 	return 0;
-}
 
+out:
+	if (index > 0) {
+		pool = 0;
+		do {
+			if (mac_pool_sel_bk & UINT64_C(1)) {
+				if (rte_eth_dev_mac_addr_add(port_id, addr,
+							     pool) != 0)
+					RTE_ETHDEV_LOG(ERR, "failed to restore MAC pool id(%u) in port %u.\n",
+						       pool, port_id);
+			}
+			mac_pool_sel_bk >>= 1;
+			pool++;
+		} while (mac_pool_sel_bk != 0);
+	}
+
+	return ret;
+}
 
 /*
  * Returns index into MAC address array of addr. Use 00:00:00:00:00:00 to find
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index d22de196db..609328f1e3 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4356,6 +4356,9 @@ int rte_eth_dev_mac_addr_remove(uint16_t port_id,
 
 /**
  * Set the default MAC address.
+ * It replaces the address at index 0 of the MAC address list.
+ * If the address was already in the MAC address list, it is removed from
+ * the rest of the list.
  *
  * @param port_id
  *   The port identifier of the Ethernet device.
-- 
2.22.0


^ permalink raw reply	[relevance 3%]

* [PATCH v5 1/3] ethdev: add IPv6 routing extension header definition
  @ 2023-02-01 11:35  3%             ` Rongwei Liu
  0 siblings, 0 replies; 200+ results
From: Rongwei Liu @ 2023-02-01 11:35 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Aman Singh, Yuying Zhang, Ferruh Yigit,
	Andrew Rybchenko, Olivier Matz

Add IPv6 routing extension header definition and no
TLV support for now.

At rte_flow layer, there are new items defined for matching
type/nexthdr/segments_left field.

Add command line support for IPv6 routing extension header
matching: type/nexthdr/segment_list.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
 doc/guides/prog_guide/rte_flow.rst     |  9 +++++
 doc/guides/rel_notes/release_23_03.rst |  9 +++++
 lib/ethdev/rte_flow.c                  |  1 +
 lib/ethdev/rte_flow.h                  | 19 +++++++++++
 lib/net/rte_ip.h                       | 20 +++++++++++
 6 files changed, 104 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 88108498e0..7a8516829c 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -298,6 +298,10 @@ enum index {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_ICMP,
 	ITEM_ICMP_TYPE,
 	ITEM_ICMP_CODE,
@@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
 	ITEM_ARP_ETH_IPV4,
 	ITEM_IPV6_EXT,
 	ITEM_IPV6_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
 	ITEM_ICMP6,
 	ITEM_ICMP6_ND_NS,
 	ITEM_ICMP6_ND_NA,
@@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_NEXT,
+	ZERO,
+};
+
+static const enum index item_ipv6_routing_ext[] = {
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_NEXT,
 	ZERO,
 };
@@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
 		.args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
 					   has_frag_ext, 1)),
 	},
+	[ITEM_IPV6_ROUTING_EXT] = {
+		.name = "ipv6_routing_ext",
+		.help = "match IPv6 routing extension header",
+		.priv = PRIV_ITEM(IPV6_ROUTING_EXT,
+				  sizeof(struct rte_flow_item_ipv6_routing_ext)),
+		.next = NEXT(item_ipv6_routing_ext),
+		.call = parse_vc,
+	},
+	[ITEM_IPV6_ROUTING_EXT_TYPE] = {
+		.name = "ext_type",
+		.help = "match IPv6 routing extension header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.type)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
+		.name = "ext_next_hdr",
+		.help = "match IPv6 routing extension header next header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.next_hdr)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
+		.name = "ext_seg_left",
+		.help = "match IPv6 routing extension header segment left",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.segments_left)),
+	},
 	[ITEM_ICMP] = {
 		.name = "icmp",
 		.help = "match ICMP header",
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 3e6242803d..602fab29d3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
 
 - ``color``: Metering color marker.
 
+Item: ``IPV6_ROUTING_EXT``
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Matches IPv6 routing extension header.
+
+- ``next_hdr``: Next layer header type.
+- ``type``: IPv6 routing extension header type.
+- ``segments_left``: How many IPv6 destination addresses carries on.
+
 Actions
 ~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index c15f6fbb9f..1337da73b8 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -69,6 +69,11 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added rte_flow support for matching IPv6 routing extension header fields.**
+
+  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing extension
+  header.
+
 
 Removed Items
 -------------
@@ -98,6 +103,10 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* net: added a new structure:
+
+    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 7d0c24366c..4da581146e 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -157,6 +157,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
 	MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
 	MK_FLOW_ITEM(METER_COLOR, sizeof(struct rte_flow_item_meter_color)),
+	MK_FLOW_ITEM(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext)),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..9b9018cba2 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -624,6 +624,13 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_meter_color.
 	 */
 	RTE_FLOW_ITEM_TYPE_METER_COLOR,
+
+	/**
+	 * Matches the presence of IPv6 routing extension header.
+	 *
+	 * @see struct rte_flow_item_ipv6_routing_ext.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
 };
 
 /**
@@ -873,6 +880,18 @@ struct rte_flow_item_ipv6 {
 	uint32_t reserved:23;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
+ *
+ * Matches an IPv6 routing extension header.
+ */
+struct rte_flow_item_ipv6_routing_ext {
+	struct rte_ipv6_routing_ext hdr;
+};
+
 /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
 #ifndef __cplusplus
 static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 9c8e8206f0..1f23e24af5 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -539,6 +539,26 @@ struct rte_ipv6_hdr {
 	uint8_t  dst_addr[16];	/**< IP address of destination host(s). */
 } __rte_packed;
 
+/**
+ * IPv6 Routing Extension Header
+ */
+struct rte_ipv6_routing_ext {
+	uint8_t next_hdr;			/**< Protocol, next header. */
+	uint8_t hdr_len;			/**< Header length. */
+	uint8_t type;				/**< Extension header type. */
+	uint8_t segments_left;			/**< Valid segments number. */
+	__extension__
+	union {
+		rte_be32_t flags;		/**< Packet control data per type. */
+		struct {
+			uint8_t last_entry;	/**< The last_entry field of SRH */
+			uint8_t flag;		/**< Packet flag. */
+			rte_be16_t tag;		/**< Packet tag. */
+		};
+	};
+	/** Following variable number of segments. */
+} __rte_packed;
+
 /* IPv6 vtc_flow: IPv / TC / flow_label */
 #define RTE_IPV6_HDR_FL_SHIFT 0
 #define RTE_IPV6_HDR_TC_SHIFT 20
-- 
2.27.0


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-01-31 17:26  3%     ` Thomas Monjalon
@ 2023-02-01  9:45  0%       ` Jiawei(Jonny) Wang
  0 siblings, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-01  9:45 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Slava Ovsiienko, Ori Kam, Aman Singh, Yuying Zhang, Ferruh Yigit,
	Andrew Rybchenko, dev, Raslan Darawsheh, david.marchand


> 30/01/2023 18:00, Jiawei Wang:
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -20,6 +20,11 @@
> >  [suppress_file]
> >          soname_regexp = ^librte_.*mlx.*glue\.
> >
> > +; Ignore fields inserted in middle padding of rte_eth_txconf
> > +[suppress_type]
> > +        name = rte_eth_txconf
> > +        has_data_member_inserted_between =
> > +{offset_after(tx_deferred_start), offset_of(offloads)}
> 
> You are adding the exception inside
> "Core suppression rules: DO NOT TOUCH".
> 
> Please move it at the end in the section "Temporary exceptions till next major
> ABI version"
> 

OK, will move.

> Also the rule does not work.
> It should be:
> 	has_data_member_inserted_between = {offset_of(tx_deferred_start),
> offset_of(offloads)}
> 

Thanks, Will change it and send with new version.
> 


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition
  2023-02-01  9:27  0%       ` Rongwei Liu
@ 2023-02-01  9:31  0%         ` Andrew Rybchenko
    0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2023-02-01  9:31 UTC (permalink / raw)
  To: Rongwei Liu, dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Raslan Darawsheh, Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz

On 2/1/23 12:27, Rongwei Liu wrote:
> HI Andrew:
> 
> BR
> Rongwei
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Wednesday, February 1, 2023 17:21
>> To: Rongwei Liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
>> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
>> <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
>> <thomas@monjalon.net>
>> Cc: Raslan Darawsheh <rasland@nvidia.com>; Aman Singh
>> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
>> Ferruh Yigit <ferruh.yigit@amd.com>; Olivier Matz <olivier.matz@6wind.com>
>> Subject: Re: [PATCH v4 1/3] ethdev: add IPv6 routing extension header
>> definition
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 1/31/23 12:36, Rongwei Liu wrote:
>>> Add IPv6 routing extension header definition and no TLV support for
>>> now.
>>>
>>> At rte_flow layer, there are new items defined for matching
>>> type/nexthdr/segments_left field.
>>>
>>> Add command line support for IPv6 routing extension header
>>> matching: type/nexthdr/segment_list.
>>>
>>> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
>>> Acked-by: Ori Kam <orika@nvidia.com>
>>> ---
>>>    app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
>>>    doc/guides/prog_guide/rte_flow.rst     |  9 +++++
>>>    doc/guides/rel_notes/release_23_03.rst |  9 +++++
>>>    lib/ethdev/rte_flow.c                  | 19 +++++++++++
>>>    lib/ethdev/rte_flow.h                  | 21 ++++++++++++
>>>    lib/net/rte_ip.h                       | 19 +++++++++++
>>>    6 files changed, 123 insertions(+)
>>>
>>> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
>>> index 88108498e0..7a8516829c 100644
>>> --- a/app/test-pmd/cmdline_flow.c
>>> +++ b/app/test-pmd/cmdline_flow.c
>>> @@ -298,6 +298,10 @@ enum index {
>>>        ITEM_IPV6_SRC,
>>>        ITEM_IPV6_DST,
>>>        ITEM_IPV6_HAS_FRAG_EXT,
>>> +     ITEM_IPV6_ROUTING_EXT,
>>> +     ITEM_IPV6_ROUTING_EXT_TYPE,
>>> +     ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
>>> +     ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
>>>        ITEM_ICMP,
>>>        ITEM_ICMP_TYPE,
>>>        ITEM_ICMP_CODE,
>>> @@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
>>>        ITEM_ARP_ETH_IPV4,
>>>        ITEM_IPV6_EXT,
>>>        ITEM_IPV6_FRAG_EXT,
>>> +     ITEM_IPV6_ROUTING_EXT,
>>>        ITEM_ICMP6,
>>>        ITEM_ICMP6_ND_NS,
>>>        ITEM_ICMP6_ND_NA,
>>> @@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
>>>        ITEM_IPV6_SRC,
>>>        ITEM_IPV6_DST,
>>>        ITEM_IPV6_HAS_FRAG_EXT,
>>> +     ITEM_IPV6_ROUTING_EXT,
>>> +     ITEM_NEXT,
>>> +     ZERO,
>>> +};
>>> +
>>> +static const enum index item_ipv6_routing_ext[] = {
>>> +     ITEM_IPV6_ROUTING_EXT_TYPE,
>>> +     ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
>>> +     ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
>>>        ITEM_NEXT,
>>>        ZERO,
>>>    };
>>> @@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
>>>                .args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
>>>                                           has_frag_ext, 1)),
>>>        },
>>> +     [ITEM_IPV6_ROUTING_EXT] = {
>>> +             .name = "ipv6_routing_ext",
>>> +             .help = "match IPv6 routing extension header",
>>> +             .priv = PRIV_ITEM(IPV6_ROUTING_EXT,
>>> +                               sizeof(struct rte_flow_item_ipv6_routing_ext)),
>>> +             .next = NEXT(item_ipv6_routing_ext),
>>> +             .call = parse_vc,
>>> +     },
>>> +     [ITEM_IPV6_ROUTING_EXT_TYPE] = {
>>> +             .name = "ext_type",
>>> +             .help = "match IPv6 routing extension header type",
>>> +             .next = NEXT(item_ipv6_routing_ext,
>> NEXT_ENTRY(COMMON_UNSIGNED),
>>> +                          item_param),
>>> +             .args = ARGS(ARGS_ENTRY_HTON(struct
>> rte_flow_item_ipv6_routing_ext,
>>> +                                          hdr.type)),
>>> +     },
>>> +     [ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
>>> +             .name = "ext_next_hdr",
>>> +             .help = "match IPv6 routing extension header next header type",
>>> +             .next = NEXT(item_ipv6_routing_ext,
>> NEXT_ENTRY(COMMON_UNSIGNED),
>>> +                          item_param),
>>> +             .args = ARGS(ARGS_ENTRY_HTON(struct
>> rte_flow_item_ipv6_routing_ext,
>>> +                                          hdr.next_hdr)),
>>> +     },
>>> +     [ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
>>> +             .name = "ext_seg_left",
>>> +             .help = "match IPv6 routing extension header segment left",
>>> +             .next = NEXT(item_ipv6_routing_ext,
>> NEXT_ENTRY(COMMON_UNSIGNED),
>>> +                          item_param),
>>> +             .args = ARGS(ARGS_ENTRY_HTON(struct
>> rte_flow_item_ipv6_routing_ext,
>>> +                                          hdr.segments_left)),
>>> +     },
>>>        [ITEM_ICMP] = {
>>>                .name = "icmp",
>>>                .help = "match ICMP header", diff --git
>>> a/doc/guides/prog_guide/rte_flow.rst
>>> b/doc/guides/prog_guide/rte_flow.rst
>>> index 3e6242803d..602fab29d3 100644
>>> --- a/doc/guides/prog_guide/rte_flow.rst
>>> +++ b/doc/guides/prog_guide/rte_flow.rst
>>> @@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
>>>
>>>    - ``color``: Metering color marker.
>>>
>>> +Item: ``IPV6_ROUTING_EXT``
>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> +
>>> +Matches IPv6 routing extension header.
>>> +
>>> +- ``next_hdr``: Next layer header type.
>>> +- ``type``: IPv6 routing extension header type.
>>> +- ``segments_left``: How many IPv6 destination addresses carries on.
>>> +
>>>    Actions
>>>    ~~~~~~~
>>>
>>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>>> b/doc/guides/rel_notes/release_23_03.rst
>>> index c15f6fbb9f..1337da73b8 100644
>>> --- a/doc/guides/rel_notes/release_23_03.rst
>>> +++ b/doc/guides/rel_notes/release_23_03.rst
>>> @@ -69,6 +69,11 @@ New Features
>>>        ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
>>>        required for eth_rx, eth_tx, crypto and timer eventdev adapters.
>>>
>>> +* **Added rte_flow support for matching IPv6 routing extension header
>>> +fields.**
>>> +
>>> +  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing
>>> + extension  header.
>>> +
>>>
>>>    Removed Items
>>>    -------------
>>> @@ -98,6 +103,10 @@ API Changes
>>>       Also, make sure to start the actual text at the margin.
>>>       =======================================================
>>>
>>> +* net: added a new structure:
>>> +
>>> +    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
>>> +
>>>
>>>    ABI Changes
>>>    -----------
>>> diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
>>> 7d0c24366c..833382c466 100644
>>> --- a/lib/ethdev/rte_flow.c
>>> +++ b/lib/ethdev/rte_flow.c
>>> @@ -76,6 +76,23 @@ rte_flow_item_flex_conv(void *buf, const void *data)
>>>        return src->length;
>>>    }
>>>
>>> +static size_t
>>> +rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data) {
>>> +     struct rte_flow_item_ipv6_routing_ext *dst = buf;
>>> +     const struct rte_flow_item_ipv6_routing_ext *src = data;
>>> +     size_t len;
>>> +
>>> +     if (src->hdr.hdr_len)
>>
>> Please, compare vs 0
>>
>>> +             len = src->hdr.hdr_len << 3;
>>> +     else
>>> +             len = src->hdr.segments_left << 4;
>>> +     if (dst == NULL)
>>> +             return 0;
>>> +     memcpy(dst->segments, src->segments, len);
>>> +     return len;
>>> +}
>>> +
>>>    /** Generate flow_item[] entry. */
>>>    #define MK_FLOW_ITEM(t, s) \
>>>        [RTE_FLOW_ITEM_TYPE_ ## t] = { \ @@ -157,6 +174,8 @@ static
>>> const struct rte_flow_desc_data rte_flow_desc_item[] = {
>>>        MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
>>>        MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
>>>        MK_FLOW_ITEM(METER_COLOR, sizeof(struct
>>> rte_flow_item_meter_color)),
>>> +     MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct
>> rte_flow_item_ipv6_routing_ext),
>>> +                     rte_flow_item_ipv6_routing_ext_conv),
>>>    };
>>>
>>>    /** Generate flow_action[] entry. */ diff --git
>>> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
>>> b60987db4b..ff9270690c 100644
>>> --- a/lib/ethdev/rte_flow.h
>>> +++ b/lib/ethdev/rte_flow.h
>>> @@ -624,6 +624,13 @@ enum rte_flow_item_type {
>>>         * See struct rte_flow_item_meter_color.
>>>         */
>>>        RTE_FLOW_ITEM_TYPE_METER_COLOR,
>>> +
>>> +     /**
>>> +      * Matches the presence of IPv6 routing extension header.
>>> +      *
>>> +      * @see struct rte_flow_item_ipv6_routing_ext.
>>> +      */
>>> +     RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
>>>    };
>>>
>>>    /**
>>> @@ -873,6 +880,20 @@ struct rte_flow_item_ipv6 {
>>>        uint32_t reserved:23;
>>>    };
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>> + *
>>> + * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
>>> + *
>>> + * Matches an IPv6 routing extension header.
>>> + */
>>> +struct rte_flow_item_ipv6_routing_ext {
>>> +     struct rte_ipv6_routing_ext hdr;
>>> +     __extension__
>>> +     rte_be32_t segments[]; /**< Each hop IPv6 address. */
>>
>> Do we really need it? Are you going to support it?
>> Will testpmd and flow conf work correctly since it uses size of the structure?
>>
> In the rte_flow.c and testpmd raw_encap() function, it relies on the hdr_len or segment_left fild not sizeof this structure.

We still have a number of sizeof() invocations for the
structure in the patch. I worry that related code will
not work fine if someone tries to specify segments.

>> IMHO we should just remove it right now if we're not going to support it.
>>
> In matching, we don't support segments field. I am ok to remove it.

Me too


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition
  2023-02-01  9:21  0%     ` Andrew Rybchenko
@ 2023-02-01  9:27  0%       ` Rongwei Liu
  2023-02-01  9:31  0%         ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Rongwei Liu @ 2023-02-01  9:27 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Raslan Darawsheh, Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz

HI Andrew:

BR
Rongwei

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, February 1, 2023 17:21
> To: Rongwei Liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Olivier Matz <olivier.matz@6wind.com>
> Subject: Re: [PATCH v4 1/3] ethdev: add IPv6 routing extension header
> definition
> 
> External email: Use caution opening links or attachments
> 
> 
> On 1/31/23 12:36, Rongwei Liu wrote:
> > Add IPv6 routing extension header definition and no TLV support for
> > now.
> >
> > At rte_flow layer, there are new items defined for matching
> > type/nexthdr/segments_left field.
> >
> > Add command line support for IPv6 routing extension header
> > matching: type/nexthdr/segment_list.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> > ---
> >   app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
> >   doc/guides/prog_guide/rte_flow.rst     |  9 +++++
> >   doc/guides/rel_notes/release_23_03.rst |  9 +++++
> >   lib/ethdev/rte_flow.c                  | 19 +++++++++++
> >   lib/ethdev/rte_flow.h                  | 21 ++++++++++++
> >   lib/net/rte_ip.h                       | 19 +++++++++++
> >   6 files changed, 123 insertions(+)
> >
> > diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> > index 88108498e0..7a8516829c 100644
> > --- a/app/test-pmd/cmdline_flow.c
> > +++ b/app/test-pmd/cmdline_flow.c
> > @@ -298,6 +298,10 @@ enum index {
> >       ITEM_IPV6_SRC,
> >       ITEM_IPV6_DST,
> >       ITEM_IPV6_HAS_FRAG_EXT,
> > +     ITEM_IPV6_ROUTING_EXT,
> > +     ITEM_IPV6_ROUTING_EXT_TYPE,
> > +     ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
> > +     ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
> >       ITEM_ICMP,
> >       ITEM_ICMP_TYPE,
> >       ITEM_ICMP_CODE,
> > @@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
> >       ITEM_ARP_ETH_IPV4,
> >       ITEM_IPV6_EXT,
> >       ITEM_IPV6_FRAG_EXT,
> > +     ITEM_IPV6_ROUTING_EXT,
> >       ITEM_ICMP6,
> >       ITEM_ICMP6_ND_NS,
> >       ITEM_ICMP6_ND_NA,
> > @@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
> >       ITEM_IPV6_SRC,
> >       ITEM_IPV6_DST,
> >       ITEM_IPV6_HAS_FRAG_EXT,
> > +     ITEM_IPV6_ROUTING_EXT,
> > +     ITEM_NEXT,
> > +     ZERO,
> > +};
> > +
> > +static const enum index item_ipv6_routing_ext[] = {
> > +     ITEM_IPV6_ROUTING_EXT_TYPE,
> > +     ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
> > +     ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
> >       ITEM_NEXT,
> >       ZERO,
> >   };
> > @@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
> >               .args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
> >                                          has_frag_ext, 1)),
> >       },
> > +     [ITEM_IPV6_ROUTING_EXT] = {
> > +             .name = "ipv6_routing_ext",
> > +             .help = "match IPv6 routing extension header",
> > +             .priv = PRIV_ITEM(IPV6_ROUTING_EXT,
> > +                               sizeof(struct rte_flow_item_ipv6_routing_ext)),
> > +             .next = NEXT(item_ipv6_routing_ext),
> > +             .call = parse_vc,
> > +     },
> > +     [ITEM_IPV6_ROUTING_EXT_TYPE] = {
> > +             .name = "ext_type",
> > +             .help = "match IPv6 routing extension header type",
> > +             .next = NEXT(item_ipv6_routing_ext,
> NEXT_ENTRY(COMMON_UNSIGNED),
> > +                          item_param),
> > +             .args = ARGS(ARGS_ENTRY_HTON(struct
> rte_flow_item_ipv6_routing_ext,
> > +                                          hdr.type)),
> > +     },
> > +     [ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
> > +             .name = "ext_next_hdr",
> > +             .help = "match IPv6 routing extension header next header type",
> > +             .next = NEXT(item_ipv6_routing_ext,
> NEXT_ENTRY(COMMON_UNSIGNED),
> > +                          item_param),
> > +             .args = ARGS(ARGS_ENTRY_HTON(struct
> rte_flow_item_ipv6_routing_ext,
> > +                                          hdr.next_hdr)),
> > +     },
> > +     [ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
> > +             .name = "ext_seg_left",
> > +             .help = "match IPv6 routing extension header segment left",
> > +             .next = NEXT(item_ipv6_routing_ext,
> NEXT_ENTRY(COMMON_UNSIGNED),
> > +                          item_param),
> > +             .args = ARGS(ARGS_ENTRY_HTON(struct
> rte_flow_item_ipv6_routing_ext,
> > +                                          hdr.segments_left)),
> > +     },
> >       [ITEM_ICMP] = {
> >               .name = "icmp",
> >               .help = "match ICMP header", diff --git
> > a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index 3e6242803d..602fab29d3 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
> >
> >   - ``color``: Metering color marker.
> >
> > +Item: ``IPV6_ROUTING_EXT``
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Matches IPv6 routing extension header.
> > +
> > +- ``next_hdr``: Next layer header type.
> > +- ``type``: IPv6 routing extension header type.
> > +- ``segments_left``: How many IPv6 destination addresses carries on.
> > +
> >   Actions
> >   ~~~~~~~
> >
> > diff --git a/doc/guides/rel_notes/release_23_03.rst
> > b/doc/guides/rel_notes/release_23_03.rst
> > index c15f6fbb9f..1337da73b8 100644
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > @@ -69,6 +69,11 @@ New Features
> >       ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
> >       required for eth_rx, eth_tx, crypto and timer eventdev adapters.
> >
> > +* **Added rte_flow support for matching IPv6 routing extension header
> > +fields.**
> > +
> > +  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing
> > + extension  header.
> > +
> >
> >   Removed Items
> >   -------------
> > @@ -98,6 +103,10 @@ API Changes
> >      Also, make sure to start the actual text at the margin.
> >      =======================================================
> >
> > +* net: added a new structure:
> > +
> > +    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
> > +
> >
> >   ABI Changes
> >   -----------
> > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> > 7d0c24366c..833382c466 100644
> > --- a/lib/ethdev/rte_flow.c
> > +++ b/lib/ethdev/rte_flow.c
> > @@ -76,6 +76,23 @@ rte_flow_item_flex_conv(void *buf, const void *data)
> >       return src->length;
> >   }
> >
> > +static size_t
> > +rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data) {
> > +     struct rte_flow_item_ipv6_routing_ext *dst = buf;
> > +     const struct rte_flow_item_ipv6_routing_ext *src = data;
> > +     size_t len;
> > +
> > +     if (src->hdr.hdr_len)
> 
> Please, compare vs 0
> 
> > +             len = src->hdr.hdr_len << 3;
> > +     else
> > +             len = src->hdr.segments_left << 4;
> > +     if (dst == NULL)
> > +             return 0;
> > +     memcpy(dst->segments, src->segments, len);
> > +     return len;
> > +}
> > +
> >   /** Generate flow_item[] entry. */
> >   #define MK_FLOW_ITEM(t, s) \
> >       [RTE_FLOW_ITEM_TYPE_ ## t] = { \ @@ -157,6 +174,8 @@ static
> > const struct rte_flow_desc_data rte_flow_desc_item[] = {
> >       MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
> >       MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
> >       MK_FLOW_ITEM(METER_COLOR, sizeof(struct
> > rte_flow_item_meter_color)),
> > +     MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct
> rte_flow_item_ipv6_routing_ext),
> > +                     rte_flow_item_ipv6_routing_ext_conv),
> >   };
> >
> >   /** Generate flow_action[] entry. */ diff --git
> > a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > b60987db4b..ff9270690c 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -624,6 +624,13 @@ enum rte_flow_item_type {
> >        * See struct rte_flow_item_meter_color.
> >        */
> >       RTE_FLOW_ITEM_TYPE_METER_COLOR,
> > +
> > +     /**
> > +      * Matches the presence of IPv6 routing extension header.
> > +      *
> > +      * @see struct rte_flow_item_ipv6_routing_ext.
> > +      */
> > +     RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
> >   };
> >
> >   /**
> > @@ -873,6 +880,20 @@ struct rte_flow_item_ipv6 {
> >       uint32_t reserved:23;
> >   };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
> > + *
> > + * Matches an IPv6 routing extension header.
> > + */
> > +struct rte_flow_item_ipv6_routing_ext {
> > +     struct rte_ipv6_routing_ext hdr;
> > +     __extension__
> > +     rte_be32_t segments[]; /**< Each hop IPv6 address. */
> 
> Do we really need it? Are you going to support it?
> Will testpmd and flow conf work correctly since it uses size of the structure?
> 
In the rte_flow.c and testpmd raw_encap() function, it relies on the hdr_len or segment_left fild not sizeof this structure.
> IMHO we should just remove it right now if we're not going to support it.
> 
In matching, we don't support segments field. I am ok to remove it. 
> > +};
> > +
> >   /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
> >   #ifndef __cplusplus
> >   static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > 9c8e8206f0..778fb5ef83 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -539,6 +539,25 @@ struct rte_ipv6_hdr {
> >       uint8_t  dst_addr[16];  /**< IP address of destination host(s). */
> >   } __rte_packed;
> >
> > +/**
> > + * IPv6 Routing Extension Header
> > + */
> > +struct rte_ipv6_routing_ext {
> > +     uint8_t next_hdr;                       /**< Protocol, next header. */
> > +     uint8_t hdr_len;                        /**< Header length. */
> > +     uint8_t type;                           /**< Extension header type. */
> > +     uint8_t segments_left;                  /**< Valid segments number. */
> > +     __extension__
> > +     union {
> > +             rte_be32_t flags;
> 
> flags should be documented as well.
Sure.
> 
> > +             struct {
> > +                     uint8_t last_entry;     /**< The last_entry field of SRH */
> > +                     uint8_t flag;           /**< Packet flag. */
> > +                     rte_be16_t tag;         /**< Packet tag. */
> > +             };
> > +     };
> 
> May be we should add a comment here that segments follow?
Sure.
> 
> > +} __rte_packed;
> > +
> >   /* IPv6 vtc_flow: IPv / TC / flow_label */
> >   #define RTE_IPV6_HDR_FL_SHIFT 0
> >   #define RTE_IPV6_HDR_TC_SHIFT 20


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition
  2023-01-31  9:36  3%   ` [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition Rongwei Liu
@ 2023-02-01  9:21  0%     ` Andrew Rybchenko
  2023-02-01  9:27  0%       ` Rongwei Liu
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2023-02-01  9:21 UTC (permalink / raw)
  To: Rongwei Liu, dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz

On 1/31/23 12:36, Rongwei Liu wrote:
> Add IPv6 routing extension header definition and no
> TLV support for now.
> 
> At rte_flow layer, there are new items defined for matching
> type/nexthdr/segments_left field.
> 
> Add command line support for IPv6 routing extension header
> matching: type/nexthdr/segment_list.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> ---
>   app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
>   doc/guides/prog_guide/rte_flow.rst     |  9 +++++
>   doc/guides/rel_notes/release_23_03.rst |  9 +++++
>   lib/ethdev/rte_flow.c                  | 19 +++++++++++
>   lib/ethdev/rte_flow.h                  | 21 ++++++++++++
>   lib/net/rte_ip.h                       | 19 +++++++++++
>   6 files changed, 123 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index 88108498e0..7a8516829c 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -298,6 +298,10 @@ enum index {
>   	ITEM_IPV6_SRC,
>   	ITEM_IPV6_DST,
>   	ITEM_IPV6_HAS_FRAG_EXT,
> +	ITEM_IPV6_ROUTING_EXT,
> +	ITEM_IPV6_ROUTING_EXT_TYPE,
> +	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
> +	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
>   	ITEM_ICMP,
>   	ITEM_ICMP_TYPE,
>   	ITEM_ICMP_CODE,
> @@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
>   	ITEM_ARP_ETH_IPV4,
>   	ITEM_IPV6_EXT,
>   	ITEM_IPV6_FRAG_EXT,
> +	ITEM_IPV6_ROUTING_EXT,
>   	ITEM_ICMP6,
>   	ITEM_ICMP6_ND_NS,
>   	ITEM_ICMP6_ND_NA,
> @@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
>   	ITEM_IPV6_SRC,
>   	ITEM_IPV6_DST,
>   	ITEM_IPV6_HAS_FRAG_EXT,
> +	ITEM_IPV6_ROUTING_EXT,
> +	ITEM_NEXT,
> +	ZERO,
> +};
> +
> +static const enum index item_ipv6_routing_ext[] = {
> +	ITEM_IPV6_ROUTING_EXT_TYPE,
> +	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
> +	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
>   	ITEM_NEXT,
>   	ZERO,
>   };
> @@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
>   		.args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
>   					   has_frag_ext, 1)),
>   	},
> +	[ITEM_IPV6_ROUTING_EXT] = {
> +		.name = "ipv6_routing_ext",
> +		.help = "match IPv6 routing extension header",
> +		.priv = PRIV_ITEM(IPV6_ROUTING_EXT,
> +				  sizeof(struct rte_flow_item_ipv6_routing_ext)),
> +		.next = NEXT(item_ipv6_routing_ext),
> +		.call = parse_vc,
> +	},
> +	[ITEM_IPV6_ROUTING_EXT_TYPE] = {
> +		.name = "ext_type",
> +		.help = "match IPv6 routing extension header type",
> +		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
> +			     item_param),
> +		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
> +					     hdr.type)),
> +	},
> +	[ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
> +		.name = "ext_next_hdr",
> +		.help = "match IPv6 routing extension header next header type",
> +		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
> +			     item_param),
> +		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
> +					     hdr.next_hdr)),
> +	},
> +	[ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
> +		.name = "ext_seg_left",
> +		.help = "match IPv6 routing extension header segment left",
> +		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
> +			     item_param),
> +		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
> +					     hdr.segments_left)),
> +	},
>   	[ITEM_ICMP] = {
>   		.name = "icmp",
>   		.help = "match ICMP header",
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 3e6242803d..602fab29d3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
>   
>   - ``color``: Metering color marker.
>   
> +Item: ``IPV6_ROUTING_EXT``
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Matches IPv6 routing extension header.
> +
> +- ``next_hdr``: Next layer header type.
> +- ``type``: IPv6 routing extension header type.
> +- ``segments_left``: How many IPv6 destination addresses carries on.
> +
>   Actions
>   ~~~~~~~
>   
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index c15f6fbb9f..1337da73b8 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -69,6 +69,11 @@ New Features
>       ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
>       required for eth_rx, eth_tx, crypto and timer eventdev adapters.
>   
> +* **Added rte_flow support for matching IPv6 routing extension header fields.**
> +
> +  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing extension
> +  header.
> +
>   
>   Removed Items
>   -------------
> @@ -98,6 +103,10 @@ API Changes
>      Also, make sure to start the actual text at the margin.
>      =======================================================
>   
> +* net: added a new structure:
> +
> +    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
> +
>   
>   ABI Changes
>   -----------
> diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
> index 7d0c24366c..833382c466 100644
> --- a/lib/ethdev/rte_flow.c
> +++ b/lib/ethdev/rte_flow.c
> @@ -76,6 +76,23 @@ rte_flow_item_flex_conv(void *buf, const void *data)
>   	return src->length;
>   }
>   
> +static size_t
> +rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data)
> +{
> +	struct rte_flow_item_ipv6_routing_ext *dst = buf;
> +	const struct rte_flow_item_ipv6_routing_ext *src = data;
> +	size_t len;
> +
> +	if (src->hdr.hdr_len)

Please, compare vs 0

> +		len = src->hdr.hdr_len << 3;
> +	else
> +		len = src->hdr.segments_left << 4;
> +	if (dst == NULL)
> +		return 0;
> +	memcpy(dst->segments, src->segments, len);
> +	return len;
> +}
> +
>   /** Generate flow_item[] entry. */
>   #define MK_FLOW_ITEM(t, s) \
>   	[RTE_FLOW_ITEM_TYPE_ ## t] = { \
> @@ -157,6 +174,8 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
>   	MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
>   	MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
>   	MK_FLOW_ITEM(METER_COLOR, sizeof(struct rte_flow_item_meter_color)),
> +	MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext),
> +			rte_flow_item_ipv6_routing_ext_conv),
>   };
>   
>   /** Generate flow_action[] entry. */
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..ff9270690c 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -624,6 +624,13 @@ enum rte_flow_item_type {
>   	 * See struct rte_flow_item_meter_color.
>   	 */
>   	RTE_FLOW_ITEM_TYPE_METER_COLOR,
> +
> +	/**
> +	 * Matches the presence of IPv6 routing extension header.
> +	 *
> +	 * @see struct rte_flow_item_ipv6_routing_ext.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
>   };
>   
>   /**
> @@ -873,6 +880,20 @@ struct rte_flow_item_ipv6 {
>   	uint32_t reserved:23;
>   };
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
> + *
> + * Matches an IPv6 routing extension header.
> + */
> +struct rte_flow_item_ipv6_routing_ext {
> +	struct rte_ipv6_routing_ext hdr;
> +	__extension__
> +	rte_be32_t segments[]; /**< Each hop IPv6 address. */

Do we really need it? Are you going to support it?
Will testpmd and flow conf work correctly since it uses size of
the structure?

IMHO we should just remove it right now if we're not going to
support it.

> +};
> +
>   /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
>   #ifndef __cplusplus
>   static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index 9c8e8206f0..778fb5ef83 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -539,6 +539,25 @@ struct rte_ipv6_hdr {
>   	uint8_t  dst_addr[16];	/**< IP address of destination host(s). */
>   } __rte_packed;
>   
> +/**
> + * IPv6 Routing Extension Header
> + */
> +struct rte_ipv6_routing_ext {
> +	uint8_t next_hdr;			/**< Protocol, next header. */
> +	uint8_t hdr_len;			/**< Header length. */
> +	uint8_t type;				/**< Extension header type. */
> +	uint8_t segments_left;			/**< Valid segments number. */
> +	__extension__
> +	union {
> +		rte_be32_t flags;

flags should be documented as well.

> +		struct {
> +			uint8_t last_entry;	/**< The last_entry field of SRH */
> +			uint8_t flag;		/**< Packet flag. */
> +			rte_be16_t tag;		/**< Packet tag. */
> +		};
> +	};

May be we should add a comment here that segments follow?

> +} __rte_packed;
> +
>   /* IPv6 vtc_flow: IPv / TC / flow_label */
>   #define RTE_IPV6_HDR_FL_SHIFT 0
>   #define RTE_IPV6_HDR_TC_SHIFT 20


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-01-30 17:00  2%   ` [PATCH v2 2/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
  2023-01-31 17:26  3%     ` Thomas Monjalon
@ 2023-02-01  9:05  0%     ` Andrew Rybchenko
  2023-02-01 15:50  0%       ` Jiawei(Jonny) Wang
  1 sibling, 1 reply; 200+ results
From: Andrew Rybchenko @ 2023-02-01  9:05 UTC (permalink / raw)
  To: Jiawei Wang, viacheslavo, orika, thomas, Aman Singh,
	Yuying Zhang, Ferruh Yigit
  Cc: dev, rasland

On 1/30/23 20:00, Jiawei Wang wrote:
> For the multiple hardware ports connect to a single DPDK port (mhpsdp),
> the previous patch introduces the new rte flow item to match the
> phy affinity of the received packets.
> 
> This patch adds the tx_phy_affinity setting in Tx queue API, the affinity

"This patch adds" -> "Add ..."

> value reflects packets be sent to which hardware port.
> Value 0 is no affinity and traffic will be routed between different
> physical ports,

Who will it be routed?

> if 0 is disabled then try to match on phy_affinity 0
> will result in an error.

Why are you talking about matching here?

> 
> Adds the new tx_phy_affinity field into the padding hole of rte_eth_txconf
> structure, the size of rte_eth_txconf keeps the same. Adds a suppress
> type for structure change in the ABI check file.
> 
> This patch adds the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two hardware ports 0 and 1 connected to
> a single DPDK port (port id 0), and phy_affinity 1 stood for
> hardware port 0 and phy_affinity 2 stood for hardware port 1,
> used the below command to config tx phy affinity for per Tx Queue:
>          port config 0 txq 0 phy_affinity 1
>          port config 0 txq 1 phy_affinity 1
>          port config 0 txq 2 phy_affinity 2
>          port config 0 txq 3 phy_affinity 2
> 
> These commands config the TxQ index 0 and TxQ index 1 with phy affinity 1,
> uses TxQ 0 or TxQ 1 send packets, these packets will be sent from the
> hardware port 0, and similar with hardware port 1 if sending packets
> with TxQ 2 or TxQ 3.

Frankly speaking I dislike it. Why do we need to expose it on
generic ethdev layer? IMHO dynamic mbuf field would be a better
solution to control Tx routing to a specific PHY port.

IMHO, we definitely need dev_info information about a number of
physical ports behind. Advertising value greater than 0 should
mean that PMD supports corresponding mbuf dynamic field to
contol ongoing physical port on Tx (or should just reject
packets on prepare which try to specify outgoing phy port
otherwise). In the same way the information may be provided
on Rx.

I'm OK to have 0 as no phy affinity value and greater than
zero as specified phy affinity. I.e. no dynamic flag is
required.

Also I think that order of patches should be different.
We should start from a patch which provides dev_info and
flow API matching and action should be in later patch.

> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>

[snip]


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-01-30 17:00  2%   ` [PATCH v2 2/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
@ 2023-01-31 17:26  3%     ` Thomas Monjalon
  2023-02-01  9:45  0%       ` Jiawei(Jonny) Wang
  2023-02-01  9:05  0%     ` Andrew Rybchenko
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-01-31 17:26 UTC (permalink / raw)
  To: Jiawei Wang
  Cc: viacheslavo, orika, Aman Singh, Yuying Zhang, Ferruh Yigit,
	Andrew Rybchenko, dev, rasland, david.marchand

30/01/2023 18:00, Jiawei Wang:
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -20,6 +20,11 @@
>  [suppress_file]
>          soname_regexp = ^librte_.*mlx.*glue\.
>  
> +; Ignore fields inserted in middle padding of rte_eth_txconf
> +[suppress_type]
> +        name = rte_eth_txconf
> +        has_data_member_inserted_between = {offset_after(tx_deferred_start), offset_of(offloads)}

You are adding the exception inside
"Core suppression rules: DO NOT TOUCH".

Please move it at the end in the section
"Temporary exceptions till next major ABI version"

Also the rule does not work.
It should be:
	has_data_member_inserted_between = {offset_of(tx_deferred_start), offset_of(offloads)}




^ permalink raw reply	[relevance 3%]

* [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition
  @ 2023-01-31  9:36  3%   ` Rongwei Liu
  2023-02-01  9:21  0%     ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Rongwei Liu @ 2023-01-31  9:36 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Aman Singh, Yuying Zhang, Ferruh Yigit,
	Andrew Rybchenko, Olivier Matz

Add IPv6 routing extension header definition and no
TLV support for now.

At rte_flow layer, there are new items defined for matching
type/nexthdr/segments_left field.

Add command line support for IPv6 routing extension header
matching: type/nexthdr/segment_list.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
 doc/guides/prog_guide/rte_flow.rst     |  9 +++++
 doc/guides/rel_notes/release_23_03.rst |  9 +++++
 lib/ethdev/rte_flow.c                  | 19 +++++++++++
 lib/ethdev/rte_flow.h                  | 21 ++++++++++++
 lib/net/rte_ip.h                       | 19 +++++++++++
 6 files changed, 123 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 88108498e0..7a8516829c 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -298,6 +298,10 @@ enum index {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_ICMP,
 	ITEM_ICMP_TYPE,
 	ITEM_ICMP_CODE,
@@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
 	ITEM_ARP_ETH_IPV4,
 	ITEM_IPV6_EXT,
 	ITEM_IPV6_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
 	ITEM_ICMP6,
 	ITEM_ICMP6_ND_NS,
 	ITEM_ICMP6_ND_NA,
@@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_NEXT,
+	ZERO,
+};
+
+static const enum index item_ipv6_routing_ext[] = {
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_NEXT,
 	ZERO,
 };
@@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
 		.args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
 					   has_frag_ext, 1)),
 	},
+	[ITEM_IPV6_ROUTING_EXT] = {
+		.name = "ipv6_routing_ext",
+		.help = "match IPv6 routing extension header",
+		.priv = PRIV_ITEM(IPV6_ROUTING_EXT,
+				  sizeof(struct rte_flow_item_ipv6_routing_ext)),
+		.next = NEXT(item_ipv6_routing_ext),
+		.call = parse_vc,
+	},
+	[ITEM_IPV6_ROUTING_EXT_TYPE] = {
+		.name = "ext_type",
+		.help = "match IPv6 routing extension header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.type)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
+		.name = "ext_next_hdr",
+		.help = "match IPv6 routing extension header next header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.next_hdr)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
+		.name = "ext_seg_left",
+		.help = "match IPv6 routing extension header segment left",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.segments_left)),
+	},
 	[ITEM_ICMP] = {
 		.name = "icmp",
 		.help = "match ICMP header",
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 3e6242803d..602fab29d3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
 
 - ``color``: Metering color marker.
 
+Item: ``IPV6_ROUTING_EXT``
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Matches IPv6 routing extension header.
+
+- ``next_hdr``: Next layer header type.
+- ``type``: IPv6 routing extension header type.
+- ``segments_left``: How many IPv6 destination addresses carries on.
+
 Actions
 ~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index c15f6fbb9f..1337da73b8 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -69,6 +69,11 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added rte_flow support for matching IPv6 routing extension header fields.**
+
+  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing extension
+  header.
+
 
 Removed Items
 -------------
@@ -98,6 +103,10 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* net: added a new structure:
+
+    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 7d0c24366c..833382c466 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -76,6 +76,23 @@ rte_flow_item_flex_conv(void *buf, const void *data)
 	return src->length;
 }
 
+static size_t
+rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data)
+{
+	struct rte_flow_item_ipv6_routing_ext *dst = buf;
+	const struct rte_flow_item_ipv6_routing_ext *src = data;
+	size_t len;
+
+	if (src->hdr.hdr_len)
+		len = src->hdr.hdr_len << 3;
+	else
+		len = src->hdr.segments_left << 4;
+	if (dst == NULL)
+		return 0;
+	memcpy(dst->segments, src->segments, len);
+	return len;
+}
+
 /** Generate flow_item[] entry. */
 #define MK_FLOW_ITEM(t, s) \
 	[RTE_FLOW_ITEM_TYPE_ ## t] = { \
@@ -157,6 +174,8 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
 	MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
 	MK_FLOW_ITEM(METER_COLOR, sizeof(struct rte_flow_item_meter_color)),
+	MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext),
+			rte_flow_item_ipv6_routing_ext_conv),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..ff9270690c 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -624,6 +624,13 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_meter_color.
 	 */
 	RTE_FLOW_ITEM_TYPE_METER_COLOR,
+
+	/**
+	 * Matches the presence of IPv6 routing extension header.
+	 *
+	 * @see struct rte_flow_item_ipv6_routing_ext.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
 };
 
 /**
@@ -873,6 +880,20 @@ struct rte_flow_item_ipv6 {
 	uint32_t reserved:23;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
+ *
+ * Matches an IPv6 routing extension header.
+ */
+struct rte_flow_item_ipv6_routing_ext {
+	struct rte_ipv6_routing_ext hdr;
+	__extension__
+	rte_be32_t segments[]; /**< Each hop IPv6 address. */
+};
+
 /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
 #ifndef __cplusplus
 static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 9c8e8206f0..778fb5ef83 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -539,6 +539,25 @@ struct rte_ipv6_hdr {
 	uint8_t  dst_addr[16];	/**< IP address of destination host(s). */
 } __rte_packed;
 
+/**
+ * IPv6 Routing Extension Header
+ */
+struct rte_ipv6_routing_ext {
+	uint8_t next_hdr;			/**< Protocol, next header. */
+	uint8_t hdr_len;			/**< Header length. */
+	uint8_t type;				/**< Extension header type. */
+	uint8_t segments_left;			/**< Valid segments number. */
+	__extension__
+	union {
+		rte_be32_t flags;
+		struct {
+			uint8_t last_entry;	/**< The last_entry field of SRH */
+			uint8_t flag;		/**< Packet flag. */
+			rte_be16_t tag;		/**< Packet tag. */
+		};
+	};
+} __rte_packed;
+
 /* IPv6 vtc_flow: IPv / TC / flow_label */
 #define RTE_IPV6_HDR_FL_SHIFT 0
 #define RTE_IPV6_HDR_TC_SHIFT 20
-- 
2.27.0


^ permalink raw reply	[relevance 3%]

* RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro conflict
  2023-01-30 19:30  3% ` [EXT] " Akhil Goyal
@ 2023-01-31  8:23  0%   ` Akhil Goyal
  2023-02-01 13:19  0%     ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2023-01-31  8:23 UTC (permalink / raw)
  To: Michael Baum, dev
  Cc: Matan Azrad, Ashish Gupta, Fan Zhang, Kai Ji, Thomas Monjalon,
	fiona.trahe, stable

> Subject: RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro
> conflict
> 
> > The "rte_compressdev_info_get()" function retrieves the contextual
> > information of a device.
> > The output structure "dev_info" contains a list of devices supported
> > capabilities for each supported algorithm.
> >
> > In this function description, it says the element after the last valid
> > element has op field set to "RTE_COMP_ALGO_LIST_END".
> > On the other hand, when this function used by
> > "rte_compressdev_capability_get()" function, it uses
> > "RTE_COMP_ALGO_UNSPECIFIED" as end of list as same as the
> > "RTE_COMP_END_OF_CAPABILITIES_LIST()".
> >
> > The mlx5 and qat PMDs use "RTE_COMP_ALGO_LIST_END" as the end of
> > capabilities list. When "rte_compressdev_capability_get()" function is
> > called with unsupported algorithm, it might read memory out of bound.
> >
> > This patch change the "rte_compressdev_info_get()" function description
> > to say using "RTE_COMP_ALGO_UNSPECIFIED" as the end of capabilities
> > list.
> > In addition, it moves both mlx5 and qat PMDs to use
> > "RTE_COMP_ALGO_UNSPECIFIED" through
> > "RTE_COMP_END_OF_CAPABILITIES_LIST()" macro.
> >
> > Fixes: 5d432f364078 ("compressdev: add device capabilities")
> > Fixes: 2d148597ce76 ("compress/qat: add gen-specific implementation")
> > Fixes: 384bac8d6555 ("compress/mlx5: add supported capabilities")
> > Cc: fiona.trahe@intel.com
> > Cc: roy.fan.zhang@intel.com
> > Cc: matan@nvidia.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Michael Baum <michaelba@nvidia.com>
> >
> > ---
> >
> > After this change, I'm not sure about the purpose of
> > "RTE_COMP_ALGO_LIST_END".
> > There is no any other use of it in DPDK code, and it isn't represent the
> > number of algorithms supported by the API since the
> > "RTE_COMP_ALGO_UNSPECIFIED" is part of the enum.
> >
> > Due to the compress API is experimental I think the
> > "RTE_COMP_ALGO_LIST_END" can be removed.
> >
> +1 to remove the list end enums. This will also help in avoiding ABI breakage
> When we make this lib as stable.

Even RTE_COMP_HASH_ALGO_LIST_END can also be removed.
It is not used anywhere.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 2/3] graph: pcap capture for graph nodes
  2023-01-24 11:21  2%   ` [PATCH v4 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
@ 2023-01-31  8:06  0%     ` Jerin Jacob
  2023-02-03  8:15  0%       ` [EXT] " Amit Prakash Shukla
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-01-31  8:06 UTC (permalink / raw)
  To: Amit Prakash Shukla
  Cc: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov, dev

On Tue, Jan 24, 2023 at 4:52 PM Amit Prakash Shukla
<amitprakashs@marvell.com> wrote:
>
> Implementation adds support to capture packets at each node with
> packet metadata and node name.
>
> Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> ---
> v2:
>  - Fixed code style issue
>  - Fixed CI compilation issue on github-robot
>
> v3:
>  - Code review suggestion from Stephen
>  - Fixed potential memory leak
>
> v4:
>  - Code review suggestion from Jerin
>
>  app/test/test_graph_perf.c             |   2 +-
>  doc/guides/rel_notes/release_23_03.rst |   7 +
>  lib/graph/graph.c                      |  17 +-
>  lib/graph/graph_pcap.c                 | 217 +++++++++++++++++++++++++
>  lib/graph/graph_pcap_private.h         | 124 ++++++++++++++
>  lib/graph/graph_populate.c             |  12 +-
>  lib/graph/graph_private.h              |   5 +
>  lib/graph/meson.build                  |   3 +-
>  lib/graph/rte_graph.h                  |   5 +
>  lib/graph/rte_graph_worker.h           |   9 +
>  10 files changed, 397 insertions(+), 4 deletions(-)
>  create mode 100644 lib/graph/graph_pcap.c
>  create mode 100644 lib/graph/graph_pcap_private.h
>
> diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
> index 1d065438a6..c5b463f700 100644
> --- a/app/test/test_graph_perf.c
> +++ b/app/test/test_graph_perf.c
> @@ -324,7 +324,7 @@ graph_init(const char *gname, uint8_t nb_srcs, uint8_t nb_sinks,
>         char nname[RTE_NODE_NAMESIZE / 2];
>         struct test_node_data *node_data;
>         char *ename[nodes_per_stage];
> -       struct rte_graph_param gconf;
> +       struct rte_graph_param gconf = {0};

If it is Fix move to seperate patch out this series.


>         const struct rte_memzone *mz;
>         uint8_t total_percent = 0;
>         rte_node_t *src_nodes;
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index 8c360b89e4..9ba392fb58 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -69,6 +69,10 @@ New Features
>      ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
>      required for eth_rx, eth_tx, crypto and timer eventdev adapters.
>
> +* **Added pcap trace support in graph library.**
> +
> +  * Added support to capture packets at each graph node with packet metadata and
> +    node name.
>
>  Removed Items
>  -------------
> @@ -101,6 +105,9 @@ API Changes
>  * Experimental function ``rte_pcapng_copy`` was updated to support comment
>    section in enhanced packet block in pcapng library.
>
> +* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
> +  ``struct graph`` were updated to support pcap trace in graph library.
> +
>  ABI Changes
>  -----------
>
> diff --git a/lib/graph/graph.c b/lib/graph/graph.c
> index 3a617cc369..a839a2803b 100644
> --- a/lib/graph/graph.c
> +++ b/lib/graph/graph.c
> @@ -15,6 +15,7 @@
>  #include <rte_string_fns.h>
>
>  #include "graph_private.h"
> +#include "graph_pcap_private.h"
>
>  static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
>  static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
> @@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
>                 node_db = node_from_name(name);
>                 if (node_db == NULL)
>                         SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
> -               node->process = node_db->process;
> +
> +               if (graph->pcap_enable) {
> +                       node->process = graph_pcap_dispatch;
> +                       node->original_process = node_db->process;
> +               } else
> +                       node->process = node_db->process;
>         }
>
>         return graph;
> @@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
>         if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
>                 return graph;
>
> +       if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
> +               graph_pcap_exit(graph);
> +
>         return graph_mem_fixup_node_ctx(graph);
>  }
>
> @@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
>         if (graph_has_isolated_node(graph))
>                 goto graph_cleanup;
>
> +       /* Initialize pcap config. */
> +       graph_pcap_enable(prm->pcap_enable);
> +
>         /* Initialize graph object */
>         graph->socket = prm->socket_id;
>         graph->src_node_count = src_node_count;
>         graph->node_count = graph_nodes_count(graph);
>         graph->id = graph_id;
> +       graph->num_pkt_to_capture = prm->num_pkt_to_capture;
> +       if (prm->pcap_filename)
> +               rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
>
>         /* Allocate the Graph fast path memory and populate the data */
>         if (graph_fp_mem_create(graph))
> diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
> new file mode 100644
> index 0000000000..7bd13ed61e
> --- /dev/null
> +++ b/lib/graph/graph_pcap.c
> @@ -0,0 +1,217 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Marvell International Ltd.
> + */
> +
> +#include <errno.h>
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <pwd.h>

Sort in alphabetical order.


> +
> +#include <rte_mbuf.h>
> +#include <rte_pcapng.h>
> +
> +#include "rte_graph_worker.h"
> +
> +#include "graph_pcap_private.h"
> +
> +#define GRAPH_PCAP_BUF_SZ      128
> +#define GRAPH_PCAP_NUM_PACKETS 1024
> +#define GRAPH_PCAP_PKT_POOL    "graph_pcap_pkt_pool"
> +#define GRAPH_PCAP_FILE_NAME   "dpdk_graph_pcap_capture_XXXXXX.pcapng"
> +
> +/* For multi-process, packets are captured in separate files. */
> +static rte_pcapng_t *pcapng_fd;
> +static bool pcap_enable;
> +struct rte_mempool *pkt_mp;
> +
> +void
> +graph_pcap_enable(bool val)
> +{
> +       pcap_enable = val;
> +}
> +
> +int
> +graph_pcap_is_enable(void)
> +{
> +       return pcap_enable;
> +}
> +
> +void
> +graph_pcap_exit(struct rte_graph *graph)
> +{
> +       if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +               if (pkt_mp)
> +                       rte_mempool_free(pkt_mp);
> +
> +       if (pcapng_fd) {
> +               rte_pcapng_close(pcapng_fd);
> +               pcapng_fd = NULL;
> +       }
> +
> +       /* Disable pcap. */
> +       graph->pcap_enable = 0;
> +       graph_pcap_enable(0);
> +}
> +
> +static int
> +graph_pcap_default_path_get(char **dir_path)
> +{
> +       struct passwd *pwd;
> +       char *home_dir;
> +
> +       /* First check for shell environment variable */
> +       home_dir = getenv("HOME");
> +       if (home_dir == NULL) {
> +               graph_warn("Home env not preset.");
> +               /* Fallback to password file entry */
> +               pwd = getpwuid(getuid());
> +               if (pwd == NULL)
> +                       return -EINVAL;
> +
> +               home_dir = pwd->pw_dir;
> +       }
> +
> +       /* Append default pcap file to directory */
> +       if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +int
> +graph_pcap_file_open(const char *filename)
> +{
> +       int fd;
> +       char file_name[RTE_GRAPH_PCAP_FILE_SZ];
> +       char *pcap_dir;
> +
> +       if (pcapng_fd)
> +               goto done;
> +
> +       if (!filename || filename[0] == '\0') {
> +               if (graph_pcap_default_path_get(&pcap_dir) < 0)
> +                       return -1;
> +               snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
> +               free(pcap_dir);
> +       } else {
> +               snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
> +                        filename);
> +       }
> +
> +       fd = mkstemps(file_name, strlen(".pcapng"));
> +       if (fd < 0) {
> +               graph_err("mkstemps() failure");
> +               return -1;
> +       }
> +
> +       graph_info("pcap filename: %s", file_name);
> +
> +       /* Open a capture file */
> +       pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
> +                                     NULL);
> +       if (pcapng_fd == NULL) {
> +               graph_err("Graph rte_pcapng_fdopen failed.");
> +               close(fd);
> +               return -1;
> +       }
> +
> +done:
> +       return 0;
> +}
> +
> +int
> +graph_pcap_mp_init(void)
> +{
> +       pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
> +       if (pkt_mp)
> +               goto done;
> +
> +       /* Make a pool for cloned packets */
> +       pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
> +                       IOV_MAX + RTE_GRAPH_BURST_SIZE, 0, 0,
> +                       rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
> +                       SOCKET_ID_ANY, "ring_mp_mc");
> +       if (pkt_mp == NULL) {
> +               graph_err("Cannot create mempool for graph pcap capture.");
> +               return -1;
> +       }
> +
> +done:
> +       return 0;
> +}
> +
> +int
> +graph_pcap_init(struct graph *graph)
> +{
> +       struct rte_graph *graph_data = graph->graph;
> +
> +       if (graph_pcap_file_open(graph->pcap_filename) < 0)
> +               goto error;
> +
> +       if (graph_pcap_mp_init() < 0)
> +               goto error;
> +
> +       /* User configured number of packets to capture. */
> +       if (graph->num_pkt_to_capture)
> +               graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
> +       else
> +               graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
> +
> +       /* All good. Now populate data for secondary process. */

No need new line.

> +
> +       rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
> +       graph_data->pcap_enable = 1;
> +
> +       return 0;
> +
> +error:
> +       graph_pcap_exit(graph_data);
> +       graph_pcap_enable(0);
> +       graph_err("Graph pcap initialization failed. Disabling pcap trace.");
> +       return -1;
> +}
> +
> +uint16_t
> +graph_pcap_dispatch(struct rte_graph *graph,
> +                             struct rte_node *node, void **objs,
> +                             uint16_t nb_objs)
> +{
> +       struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
> +       char buffer[GRAPH_PCAP_BUF_SZ];
> +       uint64_t i, num_packets;
> +       struct rte_mbuf *mbuf;
> +       ssize_t len;
> +
> +       if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
> +               goto done;
> +
> +       num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
> +       /* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
> +       if (num_packets > nb_objs)
> +               num_packets = nb_objs;
> +
> +       snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
> +
> +       for (i = 0; i < num_packets; i++) {
> +               struct rte_mbuf *mc;
> +               mbuf = (struct rte_mbuf *)objs[i];
> +
> +               mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
> +                                    rte_get_tsc_cycles(), 0, buffer);
> +               if (mc == NULL)
> +                       break;
> +
> +               mbuf_clones[i] = mc;
> +       }
> +
> +       /* write it to capture file */
> +       len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
> +       rte_pktmbuf_free_bulk(mbuf_clones, i);
> +       if (len <= 0)
> +               goto done;
> +
> +       graph->nb_pkt_captured += i;
> +
> +done:
> +       return node->original_process(graph, node, objs, nb_objs);
> +}
> diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
> new file mode 100644
> index 0000000000..198add67e2
> --- /dev/null
> +++ b/lib/graph/graph_pcap_private.h
> @@ -0,0 +1,124 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Marvell International Ltd.
> + */
> +
> +#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
> +#define _RTE_GRAPH_PCAP_PRIVATE_H_
> +
> +#include <stdint.h>
> +#include <sys/types.h>
> +
> +#include "graph_private.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

No need this for internal header files


> +
> +/**
> + * @internal
> + *
> + * Pcap trace enable/disable function.
> + *
> + * The function is called to enable/disable graph pcap trace functionality.
> + *
> + * @param val
> + *   Value to be set to enable/disable graph pcap trace.
> + */
> +void graph_pcap_enable(bool val);
> +
> +/**
> + * @internal
> + *
> + * Check graph pcap trace is enable/disable.
> + *
> + * The function is called to check if the graph pcap trace is enabled/disabled.
> + *
> + * @return
> + *   - 1: Enable
> + *   - 0: Disable
> + */
> +int graph_pcap_is_enable(void);
> +
> +/**
> + * @internal
> + *
> + * Initialise graph pcap trace functionality.
> + *
> + * The function invoked to allocate mempool.
> + *
> + * @return
> + *   0 on success and -1 on failure.
> + */
> +int graph_pcap_mp_init(void);
> +
> +/**
> + * @internal
> + *
> + * Initialise graph pcap trace functionality.
> + *
> + * The function invoked to open pcap file.
> + *
> + * @param filename
> + *   Pcap filename.
> + *
> + * @return
> + *   0 on success and -1 on failure.
> + */
> +int graph_pcap_file_open(const char *filename);
> +
> +/**
> + * @internal
> + *
> + * Initialise graph pcap trace functionality.
> + *
> + * The function invoked when the graph pcap trace is enabled. This function
> + * open's pcap file and allocates mempool. Information needed for secondary
> + * process is populated.
> + *
> + * @param graph
> + *   Pointer to graph structure.
> + *
> + * @return
> + *   0 on success and -1 on failure.
> + */
> +int graph_pcap_init(struct graph *graph);
> +
> +/**
> + * @internal
> + *
> + * Exit graph pcap trace functionality.
> + *
> + * The function is called to exit graph pcap trace and close open fd's and
> + * free up memory. Pcap trace is also disabled.
> + *
> + * @param graph
> + *   Pointer to graph structure.
> + */
> +void graph_pcap_exit(struct rte_graph *graph);
> +
> +/**
> + * @internal
> + *
> + * Capture mbuf metadata and node metadata to a pcap file.
> + *
> + * When graph pcap trace enabled, this function is invoked prior to each node
> + * and mbuf, node metadata is parsed and captured in a pcap file.
> + *
> + * @param graph
> + *   Pointer to the graph object.
> + * @param node
> + *   Pointer to the node object.
> + * @param objs
> + *   Pointer to an array of objects to be processed.
> + * @param nb_objs
> + *   Number of objects in the array.
> + */
> +uint16_t graph_pcap_dispatch(struct rte_graph *graph,
> +                                  struct rte_node *node, void **objs,
> +                                  uint16_t nb_objs);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
> diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
> index 102fd6c29b..2c0844ce92 100644
> --- a/lib/graph/graph_populate.c
> +++ b/lib/graph/graph_populate.c
> @@ -9,6 +9,7 @@
>  #include <rte_memzone.h>
>
>  #include "graph_private.h"
> +#include "graph_pcap_private.h"
>
>  static size_t
>  graph_fp_mem_calc_size(struct graph *graph)
> @@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
>                 memset(node, 0, sizeof(*node));
>                 node->fence = RTE_GRAPH_FENCE;
>                 node->off = off;
> -               node->process = graph_node->node->process;
> +               if (graph_pcap_is_enable()) {
> +                       node->process = graph_pcap_dispatch;
> +                       node->original_process = graph_node->node->process;
> +               } else
> +                       node->process = graph_node->node->process;
>                 memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
>                 pid = graph_node->node->parent_id;
>                 if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
> @@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
>         int rc;
>
>         graph_header_popluate(graph);
> +       if (graph_pcap_is_enable())
> +               graph_pcap_init(graph);
>         graph_nodes_populate(graph);
>         rc = graph_node_nexts_populate(graph);
>         rc |= graph_src_nodes_populate(graph);
> @@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
>  int
>  graph_fp_mem_destroy(struct graph *graph)
>  {
> +       if (graph_pcap_is_enable())
> +               graph_pcap_exit(graph->graph);
> +
>         graph_nodes_mem_destroy(graph->graph);
>         return rte_memzone_free(graph->mz);
>  }
> diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
> index f9a85c8926..7d1b30b8ac 100644
> --- a/lib/graph/graph_private.h
> +++ b/lib/graph/graph_private.h
> @@ -22,6 +22,7 @@ extern int rte_graph_logtype;
>                         __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
>
>  #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
> +#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
>  #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
>  #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
>
> @@ -100,6 +101,10 @@ struct graph {
>         /**< Memory size of the graph. */
>         int socket;
>         /**< Socket identifier where memory is allocated. */
> +       uint64_t num_pkt_to_capture;
> +       /**< Number of packets to be captured per core. */
> +       char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
> +       /**< pcap file name/path. */
>         STAILQ_HEAD(gnode_list, graph_node) node_list;
>         /**< Nodes in a graph. */
>  };
> diff --git a/lib/graph/meson.build b/lib/graph/meson.build
> index c7327549e8..3526d1b5d4 100644
> --- a/lib/graph/meson.build
> +++ b/lib/graph/meson.build
> @@ -14,7 +14,8 @@ sources = files(
>          'graph_debug.c',
>          'graph_stats.c',
>          'graph_populate.c',
> +        'graph_pcap.c',
>  )
>  headers = files('rte_graph.h', 'rte_graph_worker.h')
>
> -deps += ['eal']
> +deps += ['eal', 'pcapng']
> diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
> index b32c4bc217..c9a77297fc 100644
> --- a/lib/graph/rte_graph.h
> +++ b/lib/graph/rte_graph.h
> @@ -35,6 +35,7 @@ extern "C" {
>
>  #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
>  #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
> +#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
>  #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
>  #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
>  #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
> @@ -164,6 +165,10 @@ struct rte_graph_param {
>         uint16_t nb_node_patterns;  /**< Number of node patterns. */
>         const char **node_patterns;
>         /**< Array of node patterns based on shell pattern. */
> +
> +       bool pcap_enable; /**< Pcap enable. */
> +       uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
> +       char *pcap_filename; /**< Filename in which packets to be captured.*/
>  };
>
>  /**
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index fc6fee48c8..438595b15c 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -44,6 +44,12 @@ struct rte_graph {
>         rte_graph_t id; /**< Graph identifier. */
>         int socket;     /**< Socket ID where memory is allocated. */
>         char name[RTE_GRAPH_NAMESIZE];  /**< Name of the graph. */
> +       bool pcap_enable;               /**< Pcap trace enabled. */
> +       /** Number of packets captured per core. */
> +       uint64_t nb_pkt_captured;
> +       /** Number of packets to capture per core. */
> +       uint64_t nb_pkt_to_capture;
> +       char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
>         uint64_t fence;                 /**< Fence. */
>  } __rte_cache_aligned;
>
> @@ -64,6 +70,9 @@ struct rte_node {
>         char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
>         char name[RTE_NODE_NAMESIZE];   /**< Name of the node. */
>
> +       /** Original process function when pcap is enabled. */
> +       rte_node_process_t original_process;
> +
>         /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
>         uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
> --
> 2.25.1
>

^ permalink raw reply	[relevance 0%]

* [PATCH V5 2/5] ethdev: fix skip valid port in probing callback
  2023-01-31  3:33  3% ` [PATCH V5 0/5] app/testpmd: support multiple " Huisong Li
@ 2023-01-31  3:33  2%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2023-01-31  3:33 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, andrew.rybchenko, liudongdong3, huangdaode,
	fengchengwen, lihuisong

The event callback in application may use the macro RTE_ETH_FOREACH_DEV to
iterate over all enabled ports to do something(like, verifying the port id
validity) when receive a probing event. If the ethdev state of a port is
not RTE_ETH_DEV_UNUSED, this port will be considered as a valid port.

However, this state is set to RTE_ETH_DEV_ATTACHED after pushing probing
event. It means that probing callback will skip this port. But this
assignment can not move to front of probing notification. See
commit be8cd210379a ("ethdev: fix port probing notification")

So this patch has to add a new state, RTE_ETH_DEV_ALLOCATED. Set the ethdev
state to RTE_ETH_DEV_ALLOCATED before pushing probing event and set it to
RTE_ETH_DEV_ATTACHED after definitely probed. And this port is valid if its
device state is 'ALLOCATED' or 'ATTACHED'.

In addition, the new state has to be placed behind 'REMOVED' to avoid ABI
break. Fortunately, this ethdev state is internal and applications can not
access it directly. So this patch encapsulates an API, rte_eth_dev_is_used,
for ethdev or PMD to call and eliminate concerns about using this state
enum value comparison.

Fixes: be8cd210379a ("ethdev: fix port probing notification")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 drivers/net/bnxt/bnxt_ethdev.c |  3 ++-
 drivers/net/mlx5/mlx5.c        |  2 +-
 lib/ethdev/ethdev_driver.c     | 13 ++++++++++---
 lib/ethdev/ethdev_driver.h     | 12 ++++++++++++
 lib/ethdev/ethdev_pci.h        |  2 +-
 lib/ethdev/rte_class_eth.c     |  2 +-
 lib/ethdev/rte_ethdev.c        |  4 ++--
 lib/ethdev/rte_ethdev.h        |  4 +++-
 lib/ethdev/version.map         |  1 +
 9 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index b3de490d36..be5c93010e 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -6003,7 +6003,8 @@ bnxt_dev_uninit(struct rte_eth_dev *eth_dev)
 
 	PMD_DRV_LOG(DEBUG, "Calling Device uninit\n");
 
-	if (eth_dev->state != RTE_ETH_DEV_UNUSED)
+
+	if (rte_eth_dev_is_used(eth_dev->state))
 		bnxt_dev_close_op(eth_dev);
 
 	return 0;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e55be8720e..7f4dafaa22 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -3014,7 +3014,7 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_device *odev)
 	while (port_id < RTE_MAX_ETHPORTS) {
 		struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
-		if (dev->state != RTE_ETH_DEV_UNUSED &&
+		if (rte_eth_dev_is_used(dev->state) &&
 		    dev->device &&
 		    (dev->device == odev ||
 		     (dev->device->driver &&
diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index 0be1e8ca04..29e9417bea 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -50,8 +50,8 @@ eth_dev_find_free_port(void)
 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
 		/* Using shared name field to find a free port. */
 		if (eth_dev_shared_data->data[i].name[0] == '\0') {
-			RTE_ASSERT(rte_eth_devices[i].state ==
-				   RTE_ETH_DEV_UNUSED);
+			RTE_ASSERT(!rte_eth_dev_is_used(
+					rte_eth_devices[i].state));
 			return i;
 		}
 	}
@@ -208,11 +208,18 @@ rte_eth_dev_probing_finish(struct rte_eth_dev *dev)
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
 
+	dev->state = RTE_ETH_DEV_ALLOCATED;
 	rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_NEW, NULL);
 
 	dev->state = RTE_ETH_DEV_ATTACHED;
 }
 
+bool rte_eth_dev_is_used(uint16_t dev_state)
+{
+	return dev_state == RTE_ETH_DEV_ALLOCATED ||
+		dev_state == RTE_ETH_DEV_ATTACHED;
+}
+
 int
 rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 {
@@ -221,7 +228,7 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 
 	eth_dev_shared_data_prepare();
 
-	if (eth_dev->state != RTE_ETH_DEV_UNUSED)
+	if (rte_eth_dev_is_used(eth_dev->state))
 		rte_eth_dev_callback_process(eth_dev,
 				RTE_ETH_EVENT_DESTROY, NULL);
 
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 6a550cfc83..dde3ec84ef 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1542,6 +1542,18 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 __rte_internal
 void rte_eth_dev_probing_finish(struct rte_eth_dev *dev);
 
+/**
+ * Check if a Ethernet device state is used or not
+ *
+ * @param dev_state
+ *   The state of the Ethernet device
+ * @return
+ *   - true if the state of the Ethernet device is allocated or attached
+ *   - false if this state is neither allocated nor attached
+ */
+__rte_internal
+bool rte_eth_dev_is_used(uint16_t dev_state);
+
 /**
  * Create memzone for HW rings.
  * malloc can't be used as the physical address is needed.
diff --git a/lib/ethdev/ethdev_pci.h b/lib/ethdev/ethdev_pci.h
index 94b8fba5d7..23270ccd73 100644
--- a/lib/ethdev/ethdev_pci.h
+++ b/lib/ethdev/ethdev_pci.h
@@ -164,7 +164,7 @@ rte_eth_dev_pci_generic_remove(struct rte_pci_device *pci_dev,
 	 * eth device has been released.
 	 */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
-	    eth_dev->state == RTE_ETH_DEV_UNUSED)
+	    !rte_eth_dev_is_used(eth_dev->state))
 		return 0;
 
 	if (dev_uninit) {
diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c
index 838b3a8f9f..504bfd99c9 100644
--- a/lib/ethdev/rte_class_eth.c
+++ b/lib/ethdev/rte_class_eth.c
@@ -118,7 +118,7 @@ eth_dev_match(const struct rte_eth_dev *edev,
 	const struct rte_kvargs *kvlist = arg->kvlist;
 	unsigned int pair;
 
-	if (edev->state == RTE_ETH_DEV_UNUSED)
+	if (!rte_eth_dev_is_used(edev->state))
 		return -1;
 	if (arg->device != NULL && arg->device != edev->device)
 		return -1;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5d5e18db1e..86ca303ab5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -325,7 +325,7 @@ uint16_t
 rte_eth_find_next(uint16_t port_id)
 {
 	while (port_id < RTE_MAX_ETHPORTS &&
-			rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED)
+	       !rte_eth_dev_is_used(rte_eth_devices[port_id].state))
 		port_id++;
 
 	if (port_id >= RTE_MAX_ETHPORTS)
@@ -372,7 +372,7 @@ int
 rte_eth_dev_is_valid_port(uint16_t port_id)
 {
 	if (port_id >= RTE_MAX_ETHPORTS ||
-	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
+	    !rte_eth_dev_is_used(rte_eth_devices[port_id].state))
 		return 0;
 	else
 		return 1;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..d22de196db 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -2000,10 +2000,12 @@ typedef uint16_t (*rte_tx_callback_fn)(uint16_t port_id, uint16_t queue,
 enum rte_eth_dev_state {
 	/** Device is unused before being probed. */
 	RTE_ETH_DEV_UNUSED = 0,
-	/** Device is attached when allocated in probing. */
+	/** Device is attached when definitely probed. */
 	RTE_ETH_DEV_ATTACHED,
 	/** Device is in removed state when plug-out is detected. */
 	RTE_ETH_DEV_REMOVED,
+	/** Device is allocated and is set before reporting new event. */
+	RTE_ETH_DEV_ALLOCATED,
 };
 
 struct rte_eth_dev_sriov {
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17201fbe0f..094c2a952e 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -327,4 +327,5 @@ INTERNAL {
 	rte_eth_representor_id_get;
 	rte_eth_switch_domain_alloc;
 	rte_eth_switch_domain_free;
+	rte_eth_dev_is_used;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 2%]

* [PATCH V5 0/5] app/testpmd: support multiple process attach and detach port
       [not found]     <20220825024425.10534-1-lihuisong@huawei.com>
  @ 2023-01-31  3:33  3% ` Huisong Li
  2023-01-31  3:33  2%   ` [PATCH V5 2/5] ethdev: fix skip valid port in probing callback Huisong Li
  1 sibling, 1 reply; 200+ results
From: Huisong Li @ 2023-01-31  3:33 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, andrew.rybchenko, liudongdong3, huangdaode,
	fengchengwen, lihuisong

This patchset fix some bugs and support attaching and detaching port
in primary and secondary.

---
 -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
 -v4: fix a misspelling. 
 -v3:
   #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
      for other bus type.
   #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
      the probelm in patch 2/5. 
 -v2: resend due to CI unexplained failure.

Huisong Li (5):
  drivers/bus: restore driver assignment at front of probing
  ethdev: fix skip valid port in probing callback
  app/testpmd: check the validity of the port
  app/testpmd: add attach and detach port for multiple process
  app/testpmd: stop forwarding in new or destroy event

 app/test-pmd/testpmd.c                   | 47 +++++++++++++++---------
 app/test-pmd/testpmd.h                   |  1 -
 drivers/bus/auxiliary/auxiliary_common.c |  9 ++++-
 drivers/bus/dpaa/dpaa_bus.c              |  9 ++++-
 drivers/bus/fslmc/fslmc_bus.c            |  8 +++-
 drivers/bus/ifpga/ifpga_bus.c            | 12 ++++--
 drivers/bus/pci/pci_common.c             |  9 ++++-
 drivers/bus/vdev/vdev.c                  | 10 ++++-
 drivers/bus/vmbus/vmbus_common.c         |  9 ++++-
 drivers/net/bnxt/bnxt_ethdev.c           |  3 +-
 drivers/net/bonding/bonding_testpmd.c    |  1 -
 drivers/net/mlx5/mlx5.c                  |  2 +-
 lib/ethdev/ethdev_driver.c               | 13 +++++--
 lib/ethdev/ethdev_driver.h               | 12 ++++++
 lib/ethdev/ethdev_pci.h                  |  2 +-
 lib/ethdev/rte_class_eth.c               |  2 +-
 lib/ethdev/rte_ethdev.c                  |  4 +-
 lib/ethdev/rte_ethdev.h                  |  4 +-
 lib/ethdev/version.map                   |  1 +
 19 files changed, 114 insertions(+), 44 deletions(-)

-- 
2.22.0


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v10 1/2] cmdline: handle EOF in cmdline_poll
  @ 2023-01-30 22:12  3%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-01-30 22:12 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Olivier Matz

On 1/30/2023 8:09 PM, Stephen Hemminger wrote:
> If end of file is reached on input, then cmdline_read_char()
> will return 0. The problem is that cmdline_poll() was not checking
> for this and would continue and not return the status.
> 
> Fixes: 9251cd97a6be ("cmdline: add internal wrappers for character input")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/cmdline/cmdline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/cmdline/cmdline.c b/lib/cmdline/cmdline.c
> index e1009ba4c413..de41406d61e0 100644
> --- a/lib/cmdline/cmdline.c
> +++ b/lib/cmdline/cmdline.c
> @@ -194,7 +194,7 @@ cmdline_poll(struct cmdline *cl)
>  	else if (status > 0) {
>  		c = -1;
>  		read_status = cmdline_read_char(cl, &c);
> -		if (read_status < 0)
> +		if (read_status <= 0)
>  			return read_status;

According API doc it will be wrong to return '0', which imply 'RDLINE_INIT'.

But function may return any negative value on error, what about to get
eof as and error case:

if (read_status < 0)
	return read_status;
else if (read_status == 0)
	return -EIO;

With this 'cmdline_poll()' can be used in testpmd as it is used in v9 of
this patch:
while (f_quit == 0 && cl_quit == 0) {
	if (cmdline_poll(cl) < 0)
		break;
}



But still I guess this is an ABI break because of API behavior change.




^ permalink raw reply	[relevance 3%]

* RE: [EXT] [PATCH] compressdev: fix end of comp PMD list macro conflict
  @ 2023-01-30 19:30  3% ` Akhil Goyal
  2023-01-31  8:23  0%   ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2023-01-30 19:30 UTC (permalink / raw)
  To: Michael Baum, dev
  Cc: Matan Azrad, Ashish Gupta, Fan Zhang, Kai Ji, Thomas Monjalon,
	fiona.trahe, roy.fan.zhang, stable

> The "rte_compressdev_info_get()" function retrieves the contextual
> information of a device.
> The output structure "dev_info" contains a list of devices supported
> capabilities for each supported algorithm.
> 
> In this function description, it says the element after the last valid
> element has op field set to "RTE_COMP_ALGO_LIST_END".
> On the other hand, when this function used by
> "rte_compressdev_capability_get()" function, it uses
> "RTE_COMP_ALGO_UNSPECIFIED" as end of list as same as the
> "RTE_COMP_END_OF_CAPABILITIES_LIST()".
> 
> The mlx5 and qat PMDs use "RTE_COMP_ALGO_LIST_END" as the end of
> capabilities list. When "rte_compressdev_capability_get()" function is
> called with unsupported algorithm, it might read memory out of bound.
> 
> This patch change the "rte_compressdev_info_get()" function description
> to say using "RTE_COMP_ALGO_UNSPECIFIED" as the end of capabilities
> list.
> In addition, it moves both mlx5 and qat PMDs to use
> "RTE_COMP_ALGO_UNSPECIFIED" through
> "RTE_COMP_END_OF_CAPABILITIES_LIST()" macro.
> 
> Fixes: 5d432f364078 ("compressdev: add device capabilities")
> Fixes: 2d148597ce76 ("compress/qat: add gen-specific implementation")
> Fixes: 384bac8d6555 ("compress/mlx5: add supported capabilities")
> Cc: fiona.trahe@intel.com
> Cc: roy.fan.zhang@intel.com
> Cc: matan@nvidia.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Michael Baum <michaelba@nvidia.com>
> 
> ---
> 
> After this change, I'm not sure about the purpose of
> "RTE_COMP_ALGO_LIST_END".
> There is no any other use of it in DPDK code, and it isn't represent the
> number of algorithms supported by the API since the
> "RTE_COMP_ALGO_UNSPECIFIED" is part of the enum.
> 
> Due to the compress API is experimental I think the
> "RTE_COMP_ALGO_LIST_END" can be removed.
> 
+1 to remove the list end enums. This will also help in avoiding ABI breakage
When we make this lib as stable.

^ permalink raw reply	[relevance 3%]

* [PATCH v2 2/2] ethdev: introduce the PHY affinity field in Tx queue API
  @ 2023-01-30 17:00  2%   ` Jiawei Wang
  2023-01-31 17:26  3%     ` Thomas Monjalon
  2023-02-01  9:05  0%     ` Andrew Rybchenko
  0 siblings, 2 replies; 200+ results
From: Jiawei Wang @ 2023-01-30 17:00 UTC (permalink / raw)
  To: viacheslavo, orika, thomas, Aman Singh, Yuying Zhang,
	Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

For the multiple hardware ports connect to a single DPDK port (mhpsdp),
the previous patch introduces the new rte flow item to match the
phy affinity of the received packets.

This patch adds the tx_phy_affinity setting in Tx queue API, the affinity
value reflects packets be sent to which hardware port.
Value 0 is no affinity and traffic will be routed between different
physical ports, if 0 is disabled then try to match on phy_affinity 0
will result in an error.

Adds the new tx_phy_affinity field into the padding hole of rte_eth_txconf
structure, the size of rte_eth_txconf keeps the same. Adds a suppress
type for structure change in the ABI check file.

This patch adds the testpmd command line:
testpmd> port config (port_id) txq (queue_id) phy_affinity (value)

For example, there're two hardware ports 0 and 1 connected to
a single DPDK port (port id 0), and phy_affinity 1 stood for
hardware port 0 and phy_affinity 2 stood for hardware port 1,
used the below command to config tx phy affinity for per Tx Queue:
        port config 0 txq 0 phy_affinity 1
        port config 0 txq 1 phy_affinity 1
        port config 0 txq 2 phy_affinity 2
        port config 0 txq 3 phy_affinity 2

These commands config the TxQ index 0 and TxQ index 1 with phy affinity 1,
uses TxQ 0 or TxQ 1 send packets, these packets will be sent from the
hardware port 0, and similar with hardware port 1 if sending packets
with TxQ 2 or TxQ 3.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 84 +++++++++++++++++++++
 devtools/libabigail.abignore                |  5 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 13 ++++
 lib/ethdev/rte_ethdev.h                     |  7 ++
 4 files changed, 109 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b32dc8bfd4..768f35cb02 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -764,6 +764,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 
 			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
 			"    Cleanup txq mbufs for a specific Tx queue\n\n"
+
+			"port config (port_id) txq (queue_id) phy_affinity (value)\n"
+			"    Set the physical affinity value "
+			"on a specific Tx queue\n\n"
 		);
 	}
 
@@ -12621,6 +12625,85 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
 	}
 };
 
+/* *** configure port txq phy_affinity value *** */
+struct cmd_config_tx_phy_affinity {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	portid_t portid;
+	cmdline_fixed_string_t txq;
+	uint16_t qid;
+	cmdline_fixed_string_t phy_affinity;
+	uint16_t value;
+};
+
+static void
+cmd_config_tx_phy_affinity_parsed(void *parsed_result,
+				  __rte_unused struct cmdline *cl,
+				  __rte_unused void *data)
+{
+	struct cmd_config_tx_phy_affinity *res = parsed_result;
+	struct rte_port *port;
+
+	if (port_id_is_invalid(res->portid, ENABLED_WARN))
+		return;
+
+	if (res->portid == (portid_t)RTE_PORT_ALL) {
+		printf("Invalid port id\n");
+		return;
+	}
+
+	port = &ports[res->portid];
+
+	if (strcmp(res->txq, "txq")) {
+		printf("Unknown parameter\n");
+		return;
+	}
+	if (tx_queue_id_is_invalid(res->qid))
+		return;
+
+	port->txq[res->qid].conf.tx_phy_affinity = res->value;
+
+	cmd_reconfig_device_queue(res->portid, 0, 1);
+}
+
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 port, "port");
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 config, "config");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 portid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 txq, "txq");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      qid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
+				 phy_affinity, "phy_affinity");
+cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
+			      value, RTE_UINT16);
+
+static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
+	.f = cmd_config_tx_phy_affinity_parsed,
+	.data = (void *)0,
+	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
+	.tokens = {
+		(void *)&cmd_config_tx_phy_affinity_port,
+		(void *)&cmd_config_tx_phy_affinity_config,
+		(void *)&cmd_config_tx_phy_affinity_portid,
+		(void *)&cmd_config_tx_phy_affinity_txq,
+		(void *)&cmd_config_tx_phy_affinity_qid,
+		(void *)&cmd_config_tx_phy_affinity_hwport,
+		(void *)&cmd_config_tx_phy_affinity_value,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -12851,6 +12934,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_show_capability,
 	(cmdline_parse_inst_t *)&cmd_set_flex_is_pattern,
 	(cmdline_parse_inst_t *)&cmd_set_flex_spec_pattern,
+	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
 	NULL,
 };
 
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..cbbde4ef05 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -20,6 +20,11 @@
 [suppress_file]
         soname_regexp = ^librte_.*mlx.*glue\.
 
+; Ignore fields inserted in middle padding of rte_eth_txconf
+[suppress_type]
+        name = rte_eth_txconf
+        has_data_member_inserted_between = {offset_after(tx_deferred_start), offset_of(offloads)}
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Experimental APIs exceptions ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 1853030e93..e9f20607a2 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
 
 This command should be run when the port is stopped, or else it will fail.
 
+config per queue Tx physical affinity
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configure a per queue physical affinity value only on a specific Tx queue::
+
+   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
+
+* ``phy_affinity``: reflects packet can be sent to which hardware port.
+                    uses it on multiple hardware ports connect to
+                    a single DPDK port (mhpsdp).
+
+This command should be run when the port is stopped, or else it will fail.
+
 Config VXLAN Encap outer layers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..b30467c192 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1138,6 +1138,13 @@ struct rte_eth_txconf {
 				      less free descriptors than this value. */
 
 	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	/**
+	 * Physical affinity to be set.
+	 * Value 0 is no affinity and traffic could be routed between different
+	 * physical ports, if 0 is disabled then try to match on phy_affinity 0 will
+	 * result in an error.
+	 */
+	uint8_t tx_phy_affinity;
 	/**
 	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
 	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
-- 
2.18.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-30  9:56  0%                 ` Naga Harish K, S V
@ 2023-01-30 14:43  0%                   ` Jerin Jacob
  2023-02-02 16:12  0%                     ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-01-30 14:43 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Saturday, January 28, 2023 4:24 PM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> >
> > > > > > >
> > > > > > > > > +        */
> > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > >
> > > > > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init() to
> > > > make
> > > > > > > > sure rsvd is zero.
> > > > > > > >
> > > > > > >
> > > > > > > The reserved fields are not used by the adapter or application.
> > > > > > > Not sure Is it necessary to Introduce a new API to clear reserved
> > fields.
> > > > > >
> > > > > > When adapter starts using new fileds(when we add new fieds in
> > > > > > future), the old applicaiton which is not using
> > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have junk
> > > > > > value and then adapter implementation will behave bad.
> > > > > >
> > > > > >
> > > > >
> > > > > does it mean, the application doesn't re-compile for the new DPDK?
> > > >
> > > > Yes. No need recompile if ABI not breaking.
> > > >
> > > > > When some of the reserved fields are used in the future, the
> > > > > application
> > > > also may need to be recompiled along with DPDK right?
> > > > > As the application also may need to use the newly consumed
> > > > > reserved
> > > > fields?
> > > >
> > > > The problematic case is:
> > > >
> > > > Adapter implementation of 23.07(Assuming there is change params)
> > > > field needs to work with application of 23.03.
> > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > >
> > >
> > > As rte_event_eth_rx_adapter_runtime_params_init() initializes only
> > reserved fields to zero,  it may not solve the issue in this case.
> >
> > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all fields,
> > not just reserved field.
> > The application calling sequence  is
> >
> > struct my_config c;
> > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > c.interseted_filed_to_be_updated = val;
> >
> Can it be done like
>         struct my_config c = {0};
>         c.interseted_filed_to_be_updated = val;
> and update Doxygen comments to recommend above usage to reset all fields?
> This way,  rte_event_eth_rx_adapter_runtime_params_init() can be avoided.

Better to have a function for documentation clarity. Similar scheme
already there
in DPDK. See rte_eth_cman_config_init()


>
> > Let me share an example and you can tell where is the issue
> >
> > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > 3)There is an application written based on 22.03 which using only 8B after
> > calling rte_event_eth_rx_adapter_runtime_params_init()
> > 4)Assume, in 22.07 another 8B added to structure.
> > 5)Now, the application (3) needs to run on 22.07. Since the application is
> > calling rte_event_eth_rx_adapter_runtime_params_init()
> > and 9 to 15B are zero, the implementation will not go bad.
> >
> > > The old application only tries to set/get previous valid fields and the newly
> > used fields may still contain junk value.
> > > If the application wants to make use of any the newly used params, the
> > application changes are required anyway.
> >
> > Yes. If application wants to make use of newly added features. No need to
> > change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* RE: [EXT] Re: [PATCH v6 1/6] eal: trace: add trace point emit for blob
  2023-01-25 16:09  2%         ` Ferruh Yigit
@ 2023-01-30 13:35  0%           ` Ankur Dwivedi
  0 siblings, 0 replies; 200+ results
From: Ankur Dwivedi @ 2023-01-30 13:35 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: thomas, david.marchand, mdr, orika, chas3, humin29, linville,
	ciara.loftus, qi.z.zhang, mw, mk, shaibran, evgenys, igorch,
	chandu, Igor Russkikh, shepard.siegel, ed.czeck, john.miller,
	ajit.khaparde, somnath.kotur, Jerin Jacob Kollanukkaran,
	Maciej Czekaj [C],
	Shijith Thotton, Srisivasubramanian Srinivasan, Harman Kalra,
	rahul.lakkireddy, johndale, hyonkim, liudongdong3, yisen.zhuang,
	xuanziyang2, cloud.wangxiaoyun, zhouguoyang, simei.su,
	wenjun1.wu, qiming.yang, Yuying.Zhang, beilei.xing, xiao.w.wang,
	jingjing.wu, junfeng.guo, rosen.xu, Nithin Kumar Dabilpuram,
	Kiran Kumar Kokkilagadda, Sunil Kumar Kori,
	Satha Koteswara Rao Kottidi, Liron Himi, zr, Radha Chintakuntla,
	Veerasenareddy Burru, Sathesh B Edara, matan, viacheslavo,
	longli, spinler, chaoyong.he, niklas.soderlund, hemant.agrawal,
	sachin.saxena, g.singh, apeksha.gupta, sachin.saxena, aboyer,
	Rasesh Mody, Shahed Shaikh, Devendra Singh Rawat,
	andrew.rybchenko, jiawenwu, jianwang, jbehrens, maxime.coquelin,
	chenbo.xia, steven.webster, matt.peters, bruce.richardson,
	mtetsuyah, grive, jasvinder.singh, cristian.dumitrescu, jgrajcia,
	mb


>-----Original Message-----
>From: Ferruh Yigit <ferruh.yigit@amd.com>
>Sent: Wednesday, January 25, 2023 9:40 PM
>To: Ankur Dwivedi <adwivedi@marvell.com>; dev@dpdk.org
>Cc: thomas@monjalon.net; david.marchand@redhat.com; mdr@ashroe.eu;
>orika@nvidia.com; chas3@att.com; humin29@huawei.com;
>linville@tuxdriver.com; ciara.loftus@intel.com; qi.z.zhang@intel.com;
>mw@semihalf.com; mk@semihalf.com; shaibran@amazon.com;
>evgenys@amazon.com; igorch@amazon.com; chandu@amd.com; Igor
>Russkikh <irusskikh@marvell.com>; shepard.siegel@atomicrules.com;
>ed.czeck@atomicrules.com; john.miller@atomicrules.com;
>ajit.khaparde@broadcom.com; somnath.kotur@broadcom.com; Jerin Jacob
>Kollanukkaran <jerinj@marvell.com>; Maciej Czekaj [C]
><mczekaj@marvell.com>; Shijith Thotton <sthotton@marvell.com>;
>Srisivasubramanian Srinivasan <srinivasan@marvell.com>; Harman Kalra
><hkalra@marvell.com>; rahul.lakkireddy@chelsio.com; johndale@cisco.com;
>hyonkim@cisco.com; liudongdong3@huawei.com;
>yisen.zhuang@huawei.com; xuanziyang2@huawei.com;
>cloud.wangxiaoyun@huawei.com; zhouguoyang@huawei.com;
>simei.su@intel.com; wenjun1.wu@intel.com; qiming.yang@intel.com;
>Yuying.Zhang@intel.com; beilei.xing@intel.com; xiao.w.wang@intel.com;
>jingjing.wu@intel.com; junfeng.guo@intel.com; rosen.xu@intel.com; Nithin
>Kumar Dabilpuram <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda
><kirankumark@marvell.com>; Sunil Kumar Kori <skori@marvell.com>; Satha
>Koteswara Rao Kottidi <skoteshwar@marvell.com>; Liron Himi
><lironh@marvell.com>; zr@semihalf.com; Radha Chintakuntla
><radhac@marvell.com>; Veerasenareddy Burru <vburru@marvell.com>;
>Sathesh B Edara <sedara@marvell.com>; matan@nvidia.com;
>viacheslavo@nvidia.com; longli@microsoft.com; spinler@cesnet.cz;
>chaoyong.he@corigine.com; niklas.soderlund@corigine.com;
>hemant.agrawal@nxp.com; sachin.saxena@oss.nxp.com; g.singh@nxp.com;
>apeksha.gupta@nxp.com; sachin.saxena@nxp.com; aboyer@pensando.io;
>Rasesh Mody <rmody@marvell.com>; Shahed Shaikh
><shshaikh@marvell.com>; Devendra Singh Rawat
><dsinghrawat@marvell.com>; andrew.rybchenko@oktetlabs.ru;
>jiawenwu@trustnetic.com; jianwang@trustnetic.com;
>jbehrens@vmware.com; maxime.coquelin@redhat.com;
>chenbo.xia@intel.com; steven.webster@windriver.com;
>matt.peters@windriver.com; bruce.richardson@intel.com;
>mtetsuyah@gmail.com; grive@u256.net; jasvinder.singh@intel.com;
>cristian.dumitrescu@intel.com; jgrajcia@cisco.com;
>mb@smartsharesystems.com
>Subject: Re: [EXT] Re: [PATCH v6 1/6] eal: trace: add trace point emit for blob
>
>On 1/25/2023 3:02 PM, Ankur Dwivedi wrote:
>>
>>>
>>> ---------------------------------------------------------------------
>>> - On 1/20/2023 8:40 AM, Ankur Dwivedi wrote:
>>>> Adds a trace point emit function for capturing a blob. The blob
>>>> captures the length passed by the application followed by the array.
>>>>
>>>> The maximum blob bytes which can be captured is bounded by
>>>> RTE_TRACE_BLOB_LEN_MAX macro. The value for max blob length macro
>is
>>>> 64 bytes. If the length is less than 64 the remaining trailing bytes
>>>> are set to zero.
>>>>
>>>> This patch also adds test case for emit blob tracepoint function.
>>>>
>>>> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
>>>> ---
>>>>  app/test/test_trace.c                      | 11 ++++++++
>>>>  doc/guides/prog_guide/trace_lib.rst        | 12 ++++++++
>>>>  lib/eal/common/eal_common_trace_points.c   |  2 ++
>>>>  lib/eal/include/rte_eal_trace.h            |  6 ++++
>>>>  lib/eal/include/rte_trace_point.h          | 32 ++++++++++++++++++++++
>>>>  lib/eal/include/rte_trace_point_register.h |  9 ++++++
>>>>  lib/eal/version.map                        |  3 ++
>>>>  7 files changed, 75 insertions(+)
>>>>
>>>> diff --git a/app/test/test_trace.c b/app/test/test_trace.c index
>>>> 6bedf14024..ad4a394a29 100644
>>>> --- a/app/test/test_trace.c
>>>> +++ b/app/test/test_trace.c
>>>> @@ -4,6 +4,7 @@
>>>>
>>>>  #include <rte_eal_trace.h>
>>>>  #include <rte_lcore.h>
>>>> +#include <rte_random.h>
>>>>  #include <rte_trace.h>
>>>>
>>>>  #include "test.h"
>>>> @@ -177,7 +178,12 @@ test_fp_trace_points(void)  static int
>>>>  test_generic_trace_points(void)
>>>>  {
>>>> +	uint8_t arr[RTE_TRACE_BLOB_LEN_MAX];
>>>>  	int tmp;
>>>> +	int i;
>>>> +
>>>> +	for (i = 0; i < RTE_TRACE_BLOB_LEN_MAX; i++)
>>>> +		arr[i] = i;
>>>>
>>>>  	rte_eal_trace_generic_void();
>>>>  	rte_eal_trace_generic_u64(0x10000000000000);
>>>> @@ -195,6 +201,11 @@ test_generic_trace_points(void)
>>>>  	rte_eal_trace_generic_ptr(&tmp);
>>>>  	rte_eal_trace_generic_str("my string");
>>>>  	rte_eal_trace_generic_size_t(sizeof(void *));
>>>> +	rte_eal_trace_generic_blob(arr, 0);
>>>> +	rte_eal_trace_generic_blob(arr, 17);
>>>> +	rte_eal_trace_generic_blob(arr, RTE_TRACE_BLOB_LEN_MAX);
>>>> +	rte_eal_trace_generic_blob(arr, rte_rand() %
>>>> +					RTE_TRACE_BLOB_LEN_MAX);
>>>>  	RTE_EAL_TRACE_GENERIC_FUNC;
>>>>
>>>>  	return TEST_SUCCESS;
>>>> diff --git a/doc/guides/prog_guide/trace_lib.rst
>>>> b/doc/guides/prog_guide/trace_lib.rst
>>>> index 9a8f38073d..3e0ea5835c 100644
>>>> --- a/doc/guides/prog_guide/trace_lib.rst
>>>> +++ b/doc/guides/prog_guide/trace_lib.rst
>>>> @@ -352,3 +352,15 @@ event ID.
>>>>  The ``packet.header`` and ``packet.context`` will be written in the
>>>> slow path  at the time of trace memory creation. The
>>>> ``trace.header`` and trace payload  will be emitted when the tracepoint
>function is invoked.
>>>> +
>>>> +Limitations
>>>> +-----------
>>>> +
>>>> +- The ``rte_trace_point_emit_blob()`` function can capture a
>>>> +maximum blob of
>>>> +  length ``RTE_TRACE_BLOB_LEN_MAX`` bytes. The application can call
>>>> +  ``rte_trace_point_emit_blob()`` multiple times with length less
>>>> +than or equal to
>>>> +  ``RTE_TRACE_BLOB_LEN_MAX``, if it needs to capture more than
>>>> +``RTE_TRACE_BLOB_LEN_MAX``
>>>> +  bytes.
>>>> +- If the length passed to the ``rte_trace_point_emit_blob()`` is
>>>> +less than
>>>> +  ``RTE_TRACE_BLOB_LEN_MAX``, then the trailing
>>>> +``(RTE_TRACE_BLOB_LEN_MAX - len)``
>>>> +  bytes in the trace are set to zero.
>>>> diff --git a/lib/eal/common/eal_common_trace_points.c
>>>> b/lib/eal/common/eal_common_trace_points.c
>>>> index 0b0b254615..051f89809c 100644
>>>> --- a/lib/eal/common/eal_common_trace_points.c
>>>> +++ b/lib/eal/common/eal_common_trace_points.c
>>>> @@ -40,6 +40,8 @@
>>> RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_size_t,
>>>>  	lib.eal.generic.size_t)
>>>>  RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_func,
>>>>  	lib.eal.generic.func)
>>>> +RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_blob,
>>>> +	lib.eal.generic.blob)
>>>>
>>>>  RTE_TRACE_POINT_REGISTER(rte_eal_trace_alarm_set,
>>>>  	lib.eal.alarm.set)
>>>> diff --git a/lib/eal/include/rte_eal_trace.h
>>>> b/lib/eal/include/rte_eal_trace.h index 5ef4398230..e0b836eb2f
>>>> 100644
>>>> --- a/lib/eal/include/rte_eal_trace.h
>>>> +++ b/lib/eal/include/rte_eal_trace.h
>>>> @@ -143,6 +143,12 @@ RTE_TRACE_POINT(
>>>>  	rte_trace_point_emit_string(func);
>>>>  )
>>>>
>>>> +RTE_TRACE_POINT(
>>>> +	rte_eal_trace_generic_blob,
>>>> +	RTE_TRACE_POINT_ARGS(void *in, uint8_t len),
>>>> +	rte_trace_point_emit_blob(in, len);
>>>> +)
>>>> +
>>>>  #define RTE_EAL_TRACE_GENERIC_FUNC
>>>> rte_eal_trace_generic_func(__func__)
>>>>
>>>>  /* Interrupt */
>>>> diff --git a/lib/eal/include/rte_trace_point.h
>>>> b/lib/eal/include/rte_trace_point.h
>>>> index 0f8700974f..aca8344dbf 100644
>>>> --- a/lib/eal/include/rte_trace_point.h
>>>> +++ b/lib/eal/include/rte_trace_point.h
>>>> @@ -144,6 +144,16 @@ _tp _args \
>>>>  #define rte_trace_point_emit_ptr(val)
>>>>  /** Tracepoint function payload for string datatype */  #define
>>>> rte_trace_point_emit_string(val)
>>>> +/**
>>>> + * Tracepoint function to capture a blob.
>>>> + *
>>>> + * @param val
>>>> + *   Pointer to the array to be captured.
>>>> + * @param len
>>>> + *   Length to be captured. The maximum supported length is
>>>> + *   RTE_TRACE_BLOB_LEN_MAX bytes.
>>>> + */
>>>> +#define rte_trace_point_emit_blob(val, len)
>>>>
>>>
>>> This is just for doxygen right, why doxygen comments are not above
>>> the actual macros but there is a separate #if block for it?
>>
>> The actual macro is within a #ifndef __DOXYGEN__ block. I think that
>> is the reason for including Doxygen comments here.
>
>Thanks for confirming.
>
>Why comments are not as part of actual macro, but there is a separate '#ifdef
>__DOXYGEN__' block?

The actual rte_trace_point_emit_blob macro containing the definition, is inside a #ifdef ALLOW_EXPERIMENTAL_API block, so the doxygen will not get generated for  rte_trace_point_emit_blob unless ALLOW_EXPERIMENTAL_API is defined in doxygen config.

Putting the macro in #ifdef __DOXYGEN__ generates doxygen for the macro, even if ALLOW_EXPERIMENTAL_API is not defined. 
>
>>>
>>>>  #endif /* __DOXYGEN__ */
>>>>
>>>> @@ -152,6 +162,9 @@ _tp _args \
>>>>  /** @internal Macro to define event header size. */  #define
>>>> __RTE_TRACE_EVENT_HEADER_SZ sizeof(uint64_t)
>>>>
>>>> +/** Macro to define maximum emit length of blob. */ #define
>>>> +RTE_TRACE_BLOB_LEN_MAX 64
>>>> +
>>>>  /**
>>>>   * Enable recording events of the given tracepoint in the trace buffer.
>>>>   *
>>>> @@ -374,12 +387,31 @@ do { \
>>>>  	mem = RTE_PTR_ADD(mem, __RTE_TRACE_EMIT_STRING_LEN_MAX);
>>> \  } while
>>>> (0)
>>>>
>>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>>> +	if (unlikely(in == NULL)) \
>>>> +		return; \
>>>> +	if (len > RTE_TRACE_BLOB_LEN_MAX) \
>>>> +		len = RTE_TRACE_BLOB_LEN_MAX; \
>>>> +	__rte_trace_point_emit(len, uint8_t); \
>>>> +	memcpy(mem, in, len); \
>>>> +	mem = RTE_PTR_ADD(mem, len); \
>>>> +	memset(mem, 0, RTE_TRACE_BLOB_LEN_MAX - len); \
>>>> +	mem = RTE_PTR_ADD(mem, RTE_TRACE_BLOB_LEN_MAX - len); \
>>>
>>>
>>> Is first memset later memcpy not done because of performance concerns?
>>
>> The memset sets to 0 the unused bytes (RTE_TRACE_BLOB_LEN_MAX - len).
>So memset is done after memcpy.
>
>yep, I can see what is done.
>
>Question is, you can do more simply:
>memset(mem, 0, RTE_TRACE_BLOB_LEN_MAX);
>memcpy(mem, in, len);
>mem = RTE_PTR_ADD(mem, RTE_TRACE_BLOB_LEN_MAX - len);
>
>Why did you prefer the implementation you did, intentionally? If so what is
>the intention, performance concerns?
Yes performance is a concern. If memset is done before memcpy, then,
64 <= number of bytes written <= 128, depending on length value from 0 to 64.
But in memset after memcpy, always 64 bytes will be written.
>
>btw, I want to remind that size of the 'len' can be max 64 bytes.
>
>>>
>>>> +} while (0)
>>>> +
>>>>  #else
>>>>
>>>>  #define __rte_trace_point_emit_header_generic(t) RTE_SET_USED(t)
>>>> #define __rte_trace_point_emit_header_fp(t) RTE_SET_USED(t)  #define
>>>> __rte_trace_point_emit(in, type) RTE_SET_USED(in)  #define
>>>> rte_trace_point_emit_string(in) RTE_SET_USED(in)
>>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>>> +	RTE_SET_USED(in); \
>>>> +	RTE_SET_USED(len); \
>>>> +} while (0)
>>>> +
>>>>
>>>>  #endif /* ALLOW_EXPERIMENTAL_API */  #endif /*
>>>> _RTE_TRACE_POINT_REGISTER_H_ */ diff --git
>>>> a/lib/eal/include/rte_trace_point_register.h
>>>> b/lib/eal/include/rte_trace_point_register.h
>>>> index a32f4d731b..7efbac8a72 100644
>>>> --- a/lib/eal/include/rte_trace_point_register.h
>>>> +++ b/lib/eal/include/rte_trace_point_register.h
>>>> @@ -47,6 +47,15 @@ do { \
>>>>  		RTE_STR(in)"[32]", "string_bounded_t"); \  } while (0)
>>>>
>>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>>> +	RTE_SET_USED(in); \
>>>> +	__rte_trace_point_emit(len, uint8_t); \
>>>> +	__rte_trace_point_emit_field(RTE_TRACE_BLOB_LEN_MAX, \
>>>> +		RTE_STR(in)"["RTE_STR(RTE_TRACE_BLOB_LEN_MAX)"]", \
>>>> +		RTE_STR(uint8_t)); \
>>>> +} while (0)
>>>> +
>>>
>>> Why this macro defined here again, it is also defined in 'rte_trace_point.h'
>>> already?
>>> Is it because of 'register_fn()' in '__rte_trace_point_register()'?
>>
>> Yes the register happens in this function.
>
>You are not really answering questions.
>
>There are three copy of '#define rte_trace_point_emit_blob(in, len)' one of
>them is for doxygen comment, please explain why there are two more copies
>of it?
>
The rte_trace_point_emit_blob is used when ALLOW_EXPERIMENTAL_API is defined. One definition is for that. The other is basically a null definition when ALLOW_EXPERIMENTAL_API is not defined.
>>>
>>>>  #ifdef __cplusplus
>>>>  }
>>>>  #endif
>>>> diff --git a/lib/eal/version.map b/lib/eal/version.map index
>>>> 7ad12a7dc9..67be24686a 100644
>>>> --- a/lib/eal/version.map
>>>> +++ b/lib/eal/version.map
>>>> @@ -440,6 +440,9 @@ EXPERIMENTAL {
>>>>  	rte_thread_detach;
>>>>  	rte_thread_equal;
>>>>  	rte_thread_join;
>>>> +
>>>> +	# added in 23.03
>>>> +	__rte_eal_trace_generic_blob;
>>>
>>> This is not a function but a trace object.
>>> I guess it was agreed that trace object not need to be exported, and
>>> trace can be found by name?
>>
>> Yes the export in version.map can be removed. Will remove it in next patch
>series.
>
>ack.
>
>Will there be a separate patch to remove existing symbols? Although I am not
>sure if it will be ABI break.
I will send a separate patch to remove existing tracepoint symbols.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH V8 0/8] telemetry: fix data truncation and conversion error and add hex integer API
  @ 2023-01-30 10:39  0%     ` lihuisong (C)
  0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2023-01-30 10:39 UTC (permalink / raw)
  To: dev, Ferruh Yigit, Andrew Rybchenko
  Cc: bruce.richardson, mb, huangdaode, liudongdong3, fengchengwen

Kindly ping.

在 2023/1/16 20:06, lihuisong (C) 写道:
> Hi Ferruh and Andrew,
>
> This patch series optimizes some codes and bug.
> Can you take a look at this patch series?
> If there are no other questions, can it be merged?
>
> Best,
> Huisong
>
> 在 2022/12/19 15:06, Huisong Li 写道:
>> Some lib telemetry interfaces add the 'u32' and 'u64' data by the
>> rte_tel_data_add_dict/array_int API. This may cause data conversion
>> error or data truncation. This patch series uses 'u64' functions to
>> do this.
>>
>> In addition, this patch series introduces two APIs to store unsigned
>> integer values as hexadecimal encoded strings in telemetry library.
>>
>> ---
>>   -v8: fix the coding style in patch 7/8
>>   -v7: replace sprintf with snprintf in patch 6/8
>>   -v6: fix code alignment to keep in line with codes in the file
>>   -v5:
>>      - drop a refactor patch.
>>      - no limit the bit width for xxx_uint_hex API.
>>   -v4:
>>      - remove 'u32' value type.merg
>>      - add padding zero for hexadecimal value
>>   -v3: fix a misspelling mistake in commit log.
>>   -v2:
>>      - fix ABI break warning.
>>      - introduce two APIs to store u32 and u64 values as hexadecimal
>>        encoded strings.
>>
>> Huisong Li (8):
>>    telemetry: move to header to controllable range
>>    ethdev: fix possible data truncation and conversion error
>>    mempool: fix possible data truncation and conversion error
>>    cryptodev: fix possible data conversion error
>>    mem: possible data truncation and conversion error
>>    telemetry: support adding integer value as hexadecimal
>>    test: add test cases for adding hex integer value API
>>    ethdev: display capability values in hexadecimal format
>>
>>   app/test/test_telemetry_data.c     | 150 +++++++++++++++++++++++++++++
>>   lib/cryptodev/rte_cryptodev.c      |   2 +-
>>   lib/eal/common/eal_common_memory.c |  10 +-
>>   lib/ethdev/rte_ethdev.c            |  19 ++--
>>   lib/mempool/rte_mempool.c          |  24 ++---
>>   lib/telemetry/rte_telemetry.h      |  52 +++++++++-
>>   lib/telemetry/telemetry_data.c     |  73 ++++++++++++++
>>   lib/telemetry/version.map          |   9 ++
>>   8 files changed, 309 insertions(+), 30 deletions(-)
>>
>
> .

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-28 10:53  0%               ` Jerin Jacob
  2023-01-28 17:21  3%                 ` Stephen Hemminger
@ 2023-01-30  9:56  0%                 ` Naga Harish K, S V
  2023-01-30 14:43  0%                   ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-01-30  9:56 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Saturday, January 28, 2023 4:24 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> 
> > > > > >
> > > > > > > > +        */
> > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > +       /**< Reserved fields for future use */
> > > > > > >
> > > > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init() to
> > > make
> > > > > > > sure rsvd is zero.
> > > > > > >
> > > > > >
> > > > > > The reserved fields are not used by the adapter or application.
> > > > > > Not sure Is it necessary to Introduce a new API to clear reserved
> fields.
> > > > >
> > > > > When adapter starts using new fileds(when we add new fieds in
> > > > > future), the old applicaiton which is not using
> > > > > rte_event_eth_rx_adapter_runtime_params_init() may have junk
> > > > > value and then adapter implementation will behave bad.
> > > > >
> > > > >
> > > >
> > > > does it mean, the application doesn't re-compile for the new DPDK?
> > >
> > > Yes. No need recompile if ABI not breaking.
> > >
> > > > When some of the reserved fields are used in the future, the
> > > > application
> > > also may need to be recompiled along with DPDK right?
> > > > As the application also may need to use the newly consumed
> > > > reserved
> > > fields?
> > >
> > > The problematic case is:
> > >
> > > Adapter implementation of 23.07(Assuming there is change params)
> > > field needs to work with application of 23.03.
> > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > >
> >
> > As rte_event_eth_rx_adapter_runtime_params_init() initializes only
> reserved fields to zero,  it may not solve the issue in this case.
> 
> rte_event_eth_rx_adapter_runtime_params_init() needs to zero all fields,
> not just reserved field.
> The application calling sequence  is
> 
> struct my_config c;
> rte_event_eth_rx_adapter_runtime_params_init(&c)
> c.interseted_filed_to_be_updated = val;
> 
Can it be done like 
	struct my_config c = {0};
	c.interseted_filed_to_be_updated = val;
and update Doxygen comments to recommend above usage to reset all fields?
This way,  rte_event_eth_rx_adapter_runtime_params_init() can be avoided.

> Let me share an example and you can tell where is the issue
> 
> 1)Assume parameter structure is 64B and for 22.03 8B are used.
> 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> 3)There is an application written based on 22.03 which using only 8B after
> calling rte_event_eth_rx_adapter_runtime_params_init()
> 4)Assume, in 22.07 another 8B added to structure.
> 5)Now, the application (3) needs to run on 22.07. Since the application is
> calling rte_event_eth_rx_adapter_runtime_params_init()
> and 9 to 15B are zero, the implementation will not go bad.
> 
> > The old application only tries to set/get previous valid fields and the newly
> used fields may still contain junk value.
> > If the application wants to make use of any the newly used params, the
> application changes are required anyway.
> 
> Yes. If application wants to make use of newly added features. No need to
> change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* [PATCH v3 1/8] ethdev: add IPv6 routing extension header definition
  @ 2023-01-30  3:59  3%         ` Rongwei Liu
  0 siblings, 0 replies; 200+ results
From: Rongwei Liu @ 2023-01-30  3:59 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Aman Singh, Yuying Zhang,
	Ferruh Yigit, Andrew Rybchenko, Olivier Matz
  Cc: dev, rasland

Add IPv6 routing extension header definition and no
TLV support for now.

At rte_flow layer, there are new items defined for matching
type/nexthdr/segments_left field.

Add command line support for IPv6 routing extension header
matching: type/nexthdr/segment_list.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 app/test-pmd/cmdline_flow.c            | 46 ++++++++++++++++++++++++++
 doc/guides/prog_guide/rte_flow.rst     |  9 +++++
 doc/guides/rel_notes/release_23_03.rst |  9 +++++
 lib/ethdev/rte_flow.c                  | 19 +++++++++++
 lib/ethdev/rte_flow.h                  | 19 +++++++++++
 lib/net/rte_ip.h                       | 21 ++++++++++++
 6 files changed, 123 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 88108498e0..7a8516829c 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -298,6 +298,10 @@ enum index {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_ICMP,
 	ITEM_ICMP_TYPE,
 	ITEM_ICMP_CODE,
@@ -1326,6 +1330,7 @@ static const enum index next_item[] = {
 	ITEM_ARP_ETH_IPV4,
 	ITEM_IPV6_EXT,
 	ITEM_IPV6_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
 	ITEM_ICMP6,
 	ITEM_ICMP6_ND_NS,
 	ITEM_ICMP6_ND_NA,
@@ -1435,6 +1440,15 @@ static const enum index item_ipv6[] = {
 	ITEM_IPV6_SRC,
 	ITEM_IPV6_DST,
 	ITEM_IPV6_HAS_FRAG_EXT,
+	ITEM_IPV6_ROUTING_EXT,
+	ITEM_NEXT,
+	ZERO,
+};
+
+static const enum index item_ipv6_routing_ext[] = {
+	ITEM_IPV6_ROUTING_EXT_TYPE,
+	ITEM_IPV6_ROUTING_EXT_NEXT_HDR,
+	ITEM_IPV6_ROUTING_EXT_SEG_LEFT,
 	ITEM_NEXT,
 	ZERO,
 };
@@ -3844,6 +3858,38 @@ static const struct token token_list[] = {
 		.args = ARGS(ARGS_ENTRY_BF(struct rte_flow_item_ipv6,
 					   has_frag_ext, 1)),
 	},
+	[ITEM_IPV6_ROUTING_EXT] = {
+		.name = "ipv6_routing_ext",
+		.help = "match IPv6 routing extension header",
+		.priv = PRIV_ITEM(IPV6_ROUTING_EXT,
+				  sizeof(struct rte_flow_item_ipv6_routing_ext)),
+		.next = NEXT(item_ipv6_routing_ext),
+		.call = parse_vc,
+	},
+	[ITEM_IPV6_ROUTING_EXT_TYPE] = {
+		.name = "ext_type",
+		.help = "match IPv6 routing extension header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.type)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_NEXT_HDR] = {
+		.name = "ext_next_hdr",
+		.help = "match IPv6 routing extension header next header type",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.next_hdr)),
+	},
+	[ITEM_IPV6_ROUTING_EXT_SEG_LEFT] = {
+		.name = "ext_seg_left",
+		.help = "match IPv6 routing extension header segment left",
+		.next = NEXT(item_ipv6_routing_ext, NEXT_ENTRY(COMMON_UNSIGNED),
+			     item_param),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ipv6_routing_ext,
+					     hdr.segments_left)),
+	},
 	[ITEM_ICMP] = {
 		.name = "icmp",
 		.help = "match ICMP header",
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 3e6242803d..602fab29d3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
 
 - ``color``: Metering color marker.
 
+Item: ``IPV6_ROUTING_EXT``
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Matches IPv6 routing extension header.
+
+- ``next_hdr``: Next layer header type.
+- ``type``: IPv6 routing extension header type.
+- ``segments_left``: How many IPv6 destination addresses carries on.
+
 Actions
 ~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index b8c5b68d6c..8f482301f7 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added rte_flow support for matching IPv6 routing extension header fields.**
+
+  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing extension
+  header.
+
 
 Removed Items
 -------------
@@ -84,6 +89,10 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* net: added a new structure:
+
+    - IPv6 routing extension header ``rte_ipv6_routing_ext``.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 7d0c24366c..5c423db160 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -76,6 +76,23 @@ rte_flow_item_flex_conv(void *buf, const void *data)
 	return src->length;
 }
 
+static size_t
+rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data)
+{
+	struct rte_flow_item_ipv6_routing_ext *dst = buf;
+	const struct rte_flow_item_ipv6_routing_ext *src = data;
+	size_t len;
+
+	if (src->hdr.hdr_len)
+		len = src->hdr.hdr_len << 3;
+	else
+		len = src->hdr.segments_left << 4;
+	if (dst == NULL)
+		return 0;
+	rte_memcpy((void *)((uintptr_t)(dst->hdr.segments)), src->hdr.segments, len);
+	return len;
+}
+
 /** Generate flow_item[] entry. */
 #define MK_FLOW_ITEM(t, s) \
 	[RTE_FLOW_ITEM_TYPE_ ## t] = { \
@@ -157,6 +174,8 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 	MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
 	MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
 	MK_FLOW_ITEM(METER_COLOR, sizeof(struct rte_flow_item_meter_color)),
+	MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct rte_flow_item_ipv6_routing_ext),
+			rte_flow_item_ipv6_routing_ext_conv),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..9b9018cba2 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -624,6 +624,13 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_meter_color.
 	 */
 	RTE_FLOW_ITEM_TYPE_METER_COLOR,
+
+	/**
+	 * Matches the presence of IPv6 routing extension header.
+	 *
+	 * @see struct rte_flow_item_ipv6_routing_ext.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
 };
 
 /**
@@ -873,6 +880,18 @@ struct rte_flow_item_ipv6 {
 	uint32_t reserved:23;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
+ *
+ * Matches an IPv6 routing extension header.
+ */
+struct rte_flow_item_ipv6_routing_ext {
+	struct rte_ipv6_routing_ext hdr;
+};
+
 /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
 #ifndef __cplusplus
 static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 9c8e8206f0..56c151372a 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -539,6 +539,27 @@ struct rte_ipv6_hdr {
 	uint8_t  dst_addr[16];	/**< IP address of destination host(s). */
 } __rte_packed;
 
+/**
+ * IPv6 Routing Extension Header
+ */
+struct rte_ipv6_routing_ext {
+	uint8_t next_hdr;			/**< Protocol, next header. */
+	uint8_t hdr_len;			/**< Header length. */
+	uint8_t type;				/**< Extension header type. */
+	uint8_t segments_left;			/**< Valid segments number. */
+	__extension__
+	union {
+		rte_be32_t flags;
+		struct {
+			uint8_t last_entry;	/**< The last_entry field of SRH */
+			uint8_t flag;		/**< Packet flag. */
+			rte_be16_t tag;		/**< Packet tag. */
+		};
+	};
+	__extension__
+	rte_be32_t segments[0];			/**< Each hop IPv6 address. */
+} __rte_packed;
+
 /* IPv6 vtc_flow: IPv / TC / flow_label */
 #define RTE_IPV6_HDR_FL_SHIFT 0
 #define RTE_IPV6_HDR_TC_SHIFT 20
-- 
2.27.0


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v2 1/8] ethdev: add IPv6 routing extension header definition
  @ 2023-01-30  3:46  0%       ` Rongwei Liu
    1 sibling, 0 replies; 200+ results
From: Rongwei Liu @ 2023-01-30  3:46 UTC (permalink / raw)
  To: Andrew Rybchenko, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Aman Singh, Yuying Zhang, Ferruh Yigit, Olivier Matz
  Cc: dev, Raslan Darawsheh

HI Andrew

BR
Rongwei

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Friday, January 20, 2023 17:21
> To: Rongwei Liu <rongweil@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Aman
> Singh <aman.deep.singh@intel.com>; Yuying Zhang
> <yuying.zhang@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>; Olivier
> Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v2 1/8] ethdev: add IPv6 routing extension header
> definition
> 
> External email: Use caution opening links or attachments
> 
> 
> On 1/19/23 06:11, Rongwei Liu wrote:
> > Add IPv6 routing extension header definition and no TLV support for
> > now.
> >
> > At rte_flow layer, there are new items defined for matching
> > type/nexthdr/segments_left field.
> >
> > Add command line support for IPv6 routing extension header
> > matching: type/nexthdr/segment_list.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> 
> [snip]
> 
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index 3e6242803d..ae99036be0 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -1544,6 +1544,15 @@ Matches Color Marker set by a Meter.
> >
> >   - ``color``: Metering color marker.
> >
> > +Item: ``IPV6_ROUTING_EXT``
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Matches ipv6 routing extension header.
> 
> ipv6 -> IPv6
Sure.
> 
> > +
> > +- ``next_hdr``: Next layer header type.
> > +- ``type``: IPv6 routing extension header type.
> > +- ``segments_left``: How many IPv6 destination addresses carries on
> 
> Why are only 3 fields mentioned above?
> 
This is the 1st phase to matching the 1st uint32 of IPv6 routing extension. 
No need to match hdr_len since TLV is ignored.
> > +
> >   Actions
> >   ~~~~~~~
> >
> > diff --git a/doc/guides/rel_notes/release_23_03.rst
> > b/doc/guides/rel_notes/release_23_03.rst
> > index b8c5b68d6c..2a794d598e 100644
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > @@ -55,6 +55,11 @@ New Features
> >        Also, make sure to start the actual text at the margin.
> >        =======================================================
> >
> > +* **Added rte_flow support for matching IPv6 routing extension header
> > +fields.**
> > +
> > +  Added ``ipv6_routing_ext`` items in rte_flow to match IPv6 routing
> > + extension  header
> 
> Missing full stop above.
> 
Sure
> > +
> >
> >   Removed Items
> >   -------------
> > @@ -84,6 +89,11 @@ API Changes
> >      Also, make sure to start the actual text at the margin.
> >      =======================================================
> >
> > +* ethdev: added a new structure:
> > +
> > +    - IPv6 routing extension header ``rte_flow_item_ipv6_routing_ext`` and
> > +      ``rte_ipv6_routing_ext``
> > +
> 
> If I'm not mistaken, additions should not be here. It is not an API change.
> 
Checked existing release doc, "ihl" and "version" of IPv4 header is added here but with "net:" prefix.
Do you think it' good to follow? 
> >
> >   ABI Changes
> >   -----------
> > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> > 7d0c24366c..4074b475c8 100644
> > --- a/lib/ethdev/rte_flow.c
> > +++ b/lib/ethdev/rte_flow.c
> > @@ -76,6 +76,20 @@ rte_flow_item_flex_conv(void *buf, const void *data)
> >       return src->length;
> >   }
> >
> > +static size_t
> > +rte_flow_item_ipv6_routing_ext_conv(void *buf, const void *data) {
> > +     struct rte_flow_item_ipv6_routing_ext *dst = buf;
> > +     const struct rte_flow_item_ipv6_routing_ext *src = data;
> > +     size_t len;
> > +
> > +     len = src->hdr.hdr_len ? src->hdr.hdr_len << 3 :
> > + src->hdr.segments_left << 4;
> 
> Compare hdr_len vs 0 explicitly.
> Also I'd add parenthesis around ternary operator values to make it simpler to
> understand.
Sure.
> 
> > +     if (buf)
> 
> Please, compare vs NULL explicitly. May be 'dst' would be better here?
> 
> > +             rte_memcpy((void *)((uintptr_t)(dst->hdr.segments)),
> > +                        src->hdr.segments, len);
> > +     return len;
> > +}
> > +
Sure.
> >   /** Generate flow_item[] entry. */
> >   #define MK_FLOW_ITEM(t, s) \
> >       [RTE_FLOW_ITEM_TYPE_ ## t] = { \ @@ -157,6 +171,8 @@ static
> > const struct rte_flow_desc_data rte_flow_desc_item[] = {
> >       MK_FLOW_ITEM(L2TPV2, sizeof(struct rte_flow_item_l2tpv2)),
> >       MK_FLOW_ITEM(PPP, sizeof(struct rte_flow_item_ppp)),
> >       MK_FLOW_ITEM(METER_COLOR, sizeof(struct
> > rte_flow_item_meter_color)),
> > +     MK_FLOW_ITEM_FN(IPV6_ROUTING_EXT, sizeof(struct
> rte_flow_item_ipv6_routing_ext),
> > +                     rte_flow_item_ipv6_routing_ext_conv),
> >   };
> >
> >   /** Generate flow_action[] entry. */ diff --git
> > a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > b60987db4b..0120d3e7d2 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -624,6 +624,13 @@ enum rte_flow_item_type {
> >        * See struct rte_flow_item_meter_color.
> >        */
> >       RTE_FLOW_ITEM_TYPE_METER_COLOR,
> > +
> > +     /**
> > +      * Matches the presence of IPv6 routing extension header.
> > +      *
> > +      * See struct rte_flow_item_ipv6_routing_ext.
> 
> @see
> 
Sure. Looks like there are so many existing wrong usages "See struct" in this file.
> > +      */
> > +     RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT,
> >   };
> >
> >   /**
> > @@ -873,6 +880,18 @@ struct rte_flow_item_ipv6 {
> >       uint32_t reserved:23;
> >   };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT.
> > + *
> > + * Matches an IPv6 routing extension header.
> > + */
> > +struct rte_flow_item_ipv6_routing_ext {
> > +     struct rte_ipv6_routing_ext hdr; };
> > +
> 
> What about default mask?
Tried to add default mask declaration in this file but got "unused variable" warning.
Moved it to "cmdline_flow.c" since it' only used in testpmd encap logic.
> 
> >   /** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
> >   #ifndef __cplusplus
> >   static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
> > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index
> > 9c8e8206f0..158a2f83ce 100644
> > --- a/lib/net/rte_ip.h
> > +++ b/lib/net/rte_ip.h
> > @@ -539,6 +539,27 @@ struct rte_ipv6_hdr {
> >       uint8_t  dst_addr[16];  /**< IP address of destination host(s). */
> >   } __rte_packed;
> >
> > +/**
> > + * IPv6 Routing Extension Header
> > + */
> > +struct rte_ipv6_routing_ext {
> > +     uint8_t next_hdr;                       /**< Protocol, next header. */
> > +     uint8_t hdr_len;                        /**< Header length. */
> > +     uint8_t type;                           /**< Extension header type. */
> > +     uint8_t segments_left;                  /**< Valid segments number. */
> > +     __extension__
> > +     union {
> > +             uint32_t flags;
> 
> rte_be32_t ?
Sure.
> 
> > +             struct {
> > +                     uint8_t last_entry;     /**< The last_entry field of SRH */
> > +                     uint8_t flag;           /**< Packet flag. */
> > +                     uint16_t tag;           /**< Packet tag. */
> 
> rte_be16_t
Sure.
> 
> > +             };
> > +     };
> > +     __extension__
> > +     uint32_t segments[0];                   /**< Each hop IPv6 address. */
> 
> rte_be32_t
Sure.
> 
> > +} __rte_packed;
> > +
> >   /* IPv6 vtc_flow: IPv / TC / flow_label */
> >   #define RTE_IPV6_HDR_FL_SHIFT 0
> >   #define RTE_IPV6_HDR_TC_SHIFT 20


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-28 10:53  0%               ` Jerin Jacob
@ 2023-01-28 17:21  3%                 ` Stephen Hemminger
  2023-01-30  9:56  0%                 ` Naga Harish K, S V
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-01-28 17:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Naga Harish K, S V, jerinj, Carrillo, Erik G, Gujjar,
	Abhinandan S, dev, Jayatheerthan, Jay

On Sat, 28 Jan 2023 16:23:45 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> > >
> > > Yes. No need recompile if ABI not breaking.
> > >  
> > > > When some of the reserved fields are used in the future, the application  
> > > also may need to be recompiled along with DPDK right?  
> > > > As the application also may need to use the newly consumed reserved  
> > > fields?
> > >
> > > The problematic case is:
> > >
> > > Adapter implementation of 23.07(Assuming there is change params) field
> > > needs to work with application of 23.03.
> > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > >  
> >

First off, reserved fields are a problematic design choice IMHO (see YAGNI).

Second. any reserved fields can not be used in future unless the
original code enforced that all reserved fields are zero.
Same is true for holes in structs which some times get reused.

You can't use a reserved field without breaking ABI unless the previous
code enforced that the field must be zero.

^ permalink raw reply	[relevance 3%]

* [PATCH 2/2] ethdev: introduce the mhpsdp hwport field in Tx queue API
  @ 2023-01-28 13:08  2% ` Jiawei Wang
  0 siblings, 0 replies; 200+ results
From: Jiawei Wang @ 2023-01-28 13:08 UTC (permalink / raw)
  To: viacheslavo, orika, thomas, Aman Singh, Yuying Zhang,
	Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

For the multiple hardware ports connect to a single DPDK port (mhpsdp),
the previous patch introduces the new rte flow item to match the
hardware port of the received packets.

This patch adds the tx_mhpsdp_hwport setting in Tx queue API, the hwport
value reflects packets be sent to which hardware port.
0 is no port assigned and traffic will be routed between different hardware
ports, if 0 is disabled then try to match on MHPSDP_HW_PORT with 0 will
result in an error.

Adds the new tx_mhpsdp_hwport field into the padding hole of rte_eth_txconf
structure, the size of rte_eth_txconf keeps the same. Adds a suppress
type for structure change in the ABI check file.

This patch adds the testpmd command line:
testpmd> port config (port_id) txq (queue_id) mhpsdp_hwport (value)

For example, there're two hardware ports 0 and 1 connected to
a single DPDK port (port id 0), and mhpsdp_hwport 1 stood for
hardware port 0 and mhpsdp_hwport 2 stood for hardware port 1,
used the below command to config tx mhpsdp_hwport for per Tx Queue:
        port config 0 txq 0 mhpsdp_hwport 1
        port config 0 txq 1 mhpsdp_hwport 1
        port config 0 txq 2 mhpsdp_hwport 2
        port config 0 txq 3 mhpsdp_hwport 2

These commands config the TxQ index 0 and TxQ index 1 with mhpsdp_hwport 1,
uses TxQ 0 or TxQ 1 send packets, these packets will be sent from the
hardware port 0, and similar with hardware port 1 if sending packets
with TxQ 2 or TxQ 3.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 84 +++++++++++++++++++++
 devtools/libabigail.abignore                |  5 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 13 ++++
 lib/ethdev/rte_ethdev.h                     |  8 ++
 4 files changed, 110 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b32dc8bfd4..db9ea8b18a 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -764,6 +764,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 
 			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
 			"    Cleanup txq mbufs for a specific Tx queue\n\n"
+
+			"port config (port_id) txq (queue_id) mhpsdp_hwport (value)\n"
+			"    Set the hwport value in mhpsdp "
+			"on a specific Tx queue\n\n"
 		);
 	}
 
@@ -12621,6 +12625,85 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
 	}
 };
 
+/* *** configure port txq mhpsdp_hwport value *** */
+struct cmd_config_tx_mhpsdp_hwport {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	portid_t portid;
+	cmdline_fixed_string_t txq;
+	uint16_t qid;
+	cmdline_fixed_string_t mhpsdp_hwport;
+	uint16_t value;
+};
+
+static void
+cmd_config_tx_mhpsdp_hwport_parsed(void *parsed_result,
+			      __rte_unused struct cmdline *cl,
+			      __rte_unused void *data)
+{
+	struct cmd_config_tx_mhpsdp_hwport *res = parsed_result;
+	struct rte_port *port;
+
+	if (port_id_is_invalid(res->portid, ENABLED_WARN))
+		return;
+
+	if (res->portid == (portid_t)RTE_PORT_ALL) {
+		printf("Invalid port id\n");
+		return;
+	}
+
+	port = &ports[res->portid];
+
+	if (strcmp(res->txq, "txq")) {
+		printf("Unknown parameter\n");
+		return;
+	}
+	if (tx_queue_id_is_invalid(res->qid))
+		return;
+
+	port->txq[res->qid].conf.tx_mhpsdp_hwport = res->value;
+
+	cmd_reconfig_device_queue(res->portid, 0, 1);
+}
+
+cmdline_parse_token_string_t cmd_config_tx_mhpsdp_hwport_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+				 port, "port");
+cmdline_parse_token_string_t cmd_config_tx_mhpsdp_hwport_config =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+				 config, "config");
+cmdline_parse_token_num_t cmd_config_tx_mhpsdp_hwport_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+				 portid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_mhpsdp_hwport_txq =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+				 txq, "txq");
+cmdline_parse_token_num_t cmd_config_tx_mhpsdp_hwport_qid =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+			      qid, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_tx_mhpsdp_hwport_hwport =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+				 mhpsdp_hwport, "mhpsdp_hwport");
+cmdline_parse_token_num_t cmd_config_tx_mhpsdp_hwport_value =
+	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_mhpsdp_hwport,
+			      value, RTE_UINT16);
+
+static cmdline_parse_inst_t cmd_config_tx_mhpsdp_hwport = {
+	.f = cmd_config_tx_mhpsdp_hwport_parsed,
+	.data = (void *)0,
+	.help_str = "port config <port_id> txq <queue_id> mhpsdp_hwport <value>",
+	.tokens = {
+		(void *)&cmd_config_tx_mhpsdp_hwport_port,
+		(void *)&cmd_config_tx_mhpsdp_hwport_config,
+		(void *)&cmd_config_tx_mhpsdp_hwport_portid,
+		(void *)&cmd_config_tx_mhpsdp_hwport_txq,
+		(void *)&cmd_config_tx_mhpsdp_hwport_qid,
+		(void *)&cmd_config_tx_mhpsdp_hwport_hwport,
+		(void *)&cmd_config_tx_mhpsdp_hwport_value,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -12851,6 +12934,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_show_capability,
 	(cmdline_parse_inst_t *)&cmd_set_flex_is_pattern,
 	(cmdline_parse_inst_t *)&cmd_set_flex_spec_pattern,
+	(cmdline_parse_inst_t *)&cmd_config_tx_mhpsdp_hwport,
 	NULL,
 };
 
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..cbbde4ef05 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -20,6 +20,11 @@
 [suppress_file]
         soname_regexp = ^librte_.*mlx.*glue\.
 
+; Ignore fields inserted in middle padding of rte_eth_txconf
+[suppress_type]
+        name = rte_eth_txconf
+        has_data_member_inserted_between = {offset_after(tx_deferred_start), offset_of(offloads)}
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Experimental APIs exceptions ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 7be7c55d63..a05fd0e7d0 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
 
 This command should be run when the port is stopped, or else it will fail.
 
+config per queue Tx mhpsdp_hwport
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configure a mhpsdp_hwport value per queue Tx offloading only on a specific Tx queue::
+
+   testpmd> port (port_id) txq (queue_id) mhpsdp_hwport (value)
+
+* ``mhpsdp_hwport``: reflects packet can be sent to which hardware port.
+                     uses it on multiple hardware ports connect to
+                     a single DPDK port (mhpsdp).
+
+This command should be run when the port is stopped, or else it will fail.
+
 Config VXLAN Encap outer layers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..c1cef9e21d 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
 				      less free descriptors than this value. */
 
 	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	/**
+	 * Hardware port index for mhpsdp.
+	 * Value 0 is no port assigned and traffic could be routed between different
+	 * hardware ports, if 0 is disabled then try to match on MHPSDP_HW_PORT with
+	 * 0 will result in an error.
+	 * Value starts from 1 means that the first hw port in the mhpsdp.
+	 */
+	uint8_t tx_mhpsdp_hwport;
 	/**
 	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
 	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
-- 
2.18.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-25 16:32  0%             ` Naga Harish K, S V
@ 2023-01-28 10:53  0%               ` Jerin Jacob
  2023-01-28 17:21  3%                 ` Stephen Hemminger
  2023-01-30  9:56  0%                 ` Naga Harish K, S V
  0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2023-01-28 10:53 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>

> > > > >
> > > > > > > +        */
> > > > > > > +       uint32_t rsvd[15];
> > > > > > > +       /**< Reserved fields for future use */
> > > > > >
> > > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init() to
> > make
> > > > > > sure rsvd is zero.
> > > > > >
> > > > >
> > > > > The reserved fields are not used by the adapter or application.
> > > > > Not sure Is it necessary to Introduce a new API to clear reserved fields.
> > > >
> > > > When adapter starts using new fileds(when we add new fieds in
> > > > future), the old applicaiton which is not using
> > > > rte_event_eth_rx_adapter_runtime_params_init() may have junk value
> > > > and then adapter implementation will behave bad.
> > > >
> > > >
> > >
> > > does it mean, the application doesn't re-compile for the new DPDK?
> >
> > Yes. No need recompile if ABI not breaking.
> >
> > > When some of the reserved fields are used in the future, the application
> > also may need to be recompiled along with DPDK right?
> > > As the application also may need to use the newly consumed reserved
> > fields?
> >
> > The problematic case is:
> >
> > Adapter implementation of 23.07(Assuming there is change params) field
> > needs to work with application of 23.03.
> > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> >
>
> As rte_event_eth_rx_adapter_runtime_params_init() initializes only reserved fields to zero,  it may not solve the issue in this case.

rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
fields, not just reserved field.
The application calling sequence  is

struct my_config c;
rte_event_eth_rx_adapter_runtime_params_init(&c)
c.interseted_filed_to_be_updated = val;

Let me share an example and you can tell where is the issue

1)Assume parameter structure is 64B and for 22.03 8B are used.
2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
3)There is an application written based on 22.03 which using only 8B
after calling rte_event_eth_rx_adapter_runtime_params_init()
4)Assume, in 22.07 another 8B added to structure.
5)Now, the application (3) needs to run on 22.07. Since the
application is calling rte_event_eth_rx_adapter_runtime_params_init()
and 9 to 15B are zero, the implementation will not go bad.

> The old application only tries to set/get previous valid fields and the newly used fields may still contain junk value.
> If the application wants to make use of any the newly used params, the application changes are required anyway.

Yes. If application wants to make use of newly added features. No need
to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH V4 0/5] app/testpmd: support mulitple process attach and detach port
  @ 2023-01-28  1:39  0%       ` lihuisong (C)
  0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2023-01-28  1:39 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit
  Cc: dev, andrew.rybchenko, liudongdong3, huangdaode, fengchengwen


在 2023/1/19 22:35, Thomas Monjalon 写道:
> 19/01/2023 11:31, lihuisong (C):
>> 在 2023/1/18 22:12, Thomas Monjalon 写道:
>>> 11/01/2023 11:46, Ferruh Yigit:
>>>> On 1/11/2023 10:27 AM, Ferruh Yigit wrote:
>>>>> On 1/11/2023 12:53 AM, lihuisong (C) wrote:
>>>>>> 在 2023/1/11 0:51, Ferruh Yigit 写道:
>>>>>>> Hi Huisong,
>>>>>>>
>>>>>>> I haven't checked the patch in detail yet, but I can see it gives some
>>>>>>> ABI compatibility warnings, is this expected:
>>>>>> This is to be expected. Because we insert a device state,
>>>>>> RTE_ETH_DEV_ALLOCATED,
>>>>>> before RTE_ETH_DEV_ATTACHED for resolving the issue patch 2/5 mentioned.
>>>>>> We may have to announce it. What do you think?
>>>>> If there is an actual ABI break, it can't go in this release, need to
>>>>> wait LTS release and yes needs deprecation notice in advance.
>>>>>
>>>>> But not all enum value change warnings are real break, need to
>>>>> investigate all warnings one by one.
>>>>> Need to investigate if old application & new dpdk library may cause any
>>>>> unexpected behavior for application.
>>>>>
>>>> OR, appending new enum item, `RTE_ETH_DEV_ALLOCATED`, to the end of the
>>>> enum solves the issue, although logically it won't look nice.
>>>> Perhaps order can be fixed in next LTS, to have more logical order, but
>>>> not quite sure if order worth the disturbance may cause in application.
>>> It is a state with a logical order, so it would be nice to be able to do
>>> if (state > RTE_ETH_DEV_ALLOCATED)
>>> but given there is RTE_ETH_DEV_REMOVED later in the enum, not sure it is useful.
>> The device state is internel. Applications should not access it
>> directly, right?
> Right
>
>> Currently, ethdev layer or PMD use it by enum value instead of the way like
>> 'state > RTE_ETH_DEV_ALLOCATED'.
> Right
>
>> But, I encapsulated an API, rte_eth_dev_is_used(), for ethdev or PMD to
>> call.
>> I'm not sure if it can help to eliminate our concerns.
> Yes I think it's OK.
ok, I will fix it based on our discussion.
>
>
> .

^ permalink raw reply	[relevance 0%]

* Re: deprecation notice process / clarification
  2023-01-25 22:36  3% deprecation notice process / clarification Tyler Retzlaff
@ 2023-01-27 12:47  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-01-27 12:47 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev

25/01/2023 23:36, Tyler Retzlaff:
> hi,
> 
> i'm looking for some guidance when cleaning up / removing the remaining
> shim functions for pthread in windows and i'm not sure how our
> deprecation notice / policies apply.
> 
> windows has been providing lib/eal/windows/include/pthread.h shim that
> allowed applications to use e.g. pthread_xxx functions on windows.
> 
> these shim functions were all being provided as inline functions and
> were not part of the EAL api but on windows they were implicitly part of
> the api surface exposed (to windows only). on posix platforms applications
> did not rely on EAL for pthread abi or api (obviously).
> 
> recently we introduced a set of platform agnostic thread api in the EAL.
> the functions were marked __rte_experimental as a part of new API
> addition policy.
> 
> what's the most appropriate way to remove the pthread_xxx shim inline
> functions from lib/eal/windows/include/pthread.h? do we have to wait the
> full deprecation notice period which can't be started until we make the
> new functions stable? also keeping in mind we can't actually mark inline
> functions __rte_deprecated.
> 
> is this a special case where we can just rip them out and break
> compilation?

Probably yes.

> input is appreciated, particularly from any consumers of the windows
> port who might be unhappy.

I think there is not too much users of Windows DPDK yet.
I would be in favor of just removing the pthread shim layer
for Windows when rte_thread equivalent is declared stable.



^ permalink raw reply	[relevance 0%]

* deprecation notice process / clarification
@ 2023-01-25 22:36  3% Tyler Retzlaff
  2023-01-27 12:47  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-01-25 22:36 UTC (permalink / raw)
  To: dev

hi,

i'm looking for some guidance when cleaning up / removing the remaining
shim functions for pthread in windows and i'm not sure how our
deprecation notice / policies apply.

windows has been providing lib/eal/windows/include/pthread.h shim that
allowed applications to use e.g. pthread_xxx functions on windows.

these shim functions were all being provided as inline functions and
were not part of the EAL api but on windows they were implicitly part of
the api surface exposed (to windows only). on posix platforms applications
did not rely on EAL for pthread abi or api (obviously).

recently we introduced a set of platform agnostic thread api in the EAL.
the functions were marked __rte_experimental as a part of new API
addition policy.

what's the most appropriate way to remove the pthread_xxx shim inline
functions from lib/eal/windows/include/pthread.h? do we have to wait the
full deprecation notice period which can't be started until we make the
new functions stable? also keeping in mind we can't actually mark inline
functions __rte_deprecated.

is this a special case where we can just rip them out and break
compilation?

input is appreciated, particularly from any consumers of the windows
port who might be unhappy.

thanks

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-25 10:38  3%           ` Jerin Jacob
@ 2023-01-25 16:32  0%             ` Naga Harish K, S V
  2023-01-28 10:53  0%               ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-01-25 16:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, January 25, 2023 4:08 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Wed, Jan 25, 2023 at 3:22 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, January 25, 2023 9:42 AM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Tue, Jan 24, 2023 at 6:37 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Tuesday, January 24, 2023 10:00 AM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Mon, Jan 23, 2023 at 11:35 PM Naga Harish K S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > > The adapter configuration parameters defined in the ``struct
> > > > > > rte_event_eth_rx_adapter_runtime_params`` can be configured
> > > > > > and retrieved using
> > > > > > ``rte_event_eth_rx_adapter_runtime_params_set``
> > > > > > and ``rte_event_eth_tx_adapter_runtime_params_get``
> respectively.
> > > > > >
> > > > > > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > > > >
> > > > > > diff --git
> > > > > > a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > > b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > > index 461eca566f..2207d6ffc3 100644
> > > > > > --- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > > +++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > > @@ -185,6 +185,26 @@ flags for handling received packets,
> > > > > > event queue identifier, scheduler type,  event priority,
> > > > > > polling frequency of the receive queue and flow identifier  in
> > > > > > struct
> > > > > ``rte_event_eth_rx_adapter_queue_conf``.
> > > > > >
> > > > > > +Set/Get adapter runtime configuration parameters
> > > > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > +
> > > > > > +The runtime configuration parameters of adapter can be
> > > > > > +set/read using
> > > > > > +``rte_event_eth_rx_adapter_runtime_params_set()`` and
> ``rte_event_eth_rx_adapter_runtime_params_get()`` respectively.
> > > > > > +The parameters that can be set/read are defined in ``struct
> > > > > rte_event_eth_rx_adapter_runtime_params``.
> > > > >
> > > > > Good.
> > > > >
> > > > > > +
> > > > > > +``rte_event_eth_rx_adapter_create()`` or
> > > > > > +``rte_event_eth_rx_adapter_create_with_params()`` configures
> > > > > > +the adapter with default value for maximum packets processed
> > > > > > +per request to
> > > > > 128.
> > > > > > +``rte_event_eth_rx_adapter_runtime_params_set()`` function
> > > > > > +allows to reconfigure maximum number of packets processed by
> > > > > > +adapter per service request. This is alternative to
> > > > > > +configuring the maximum packets processed per request by
> > > > > > +adapter by using ``rte_event_eth_rx_adapter_create_ext()``
> > > > > > +with parameter
> > > > > ``rte_event_eth_rx_adapter_conf::max_nb_rx``.
> > > > >
> > > > > This paragraph is not needed IMO. As it is specific to a driver,
> > > > > and we can keep Doxygen comment only.
> > > > >
> > > > >
> > > > > > +
> > > > > > +``rte_event_eth_rx_adapter_runtime_parmas_get()`` function
> > > > > > +retrieves the configuration parameters that are defined in
> > > > > > +``struct
> > > > > rte_event_eth_rx_adapter_runtime_params``.
> > > > >
> > > > > Good.
> > > > >
> > > > > > +
> > > > > >  Getting and resetting Adapter queue stats
> > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > >
> > > > > > diff --git a/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > > b/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > > index 34aa87379e..d8f3e750b7 100644
> > > > > > --- a/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > > +++ b/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > > @@ -35,6 +35,8 @@
> > > > > >  #define MAX_VECTOR_NS          1E9
> > > > > >  #define MIN_VECTOR_NS          1E5
> > > > > >
> > > > > > +#define RXA_NB_RX_WORK_DEFAULT 128
> > > > > > +
> > > > > >  #define ETH_RX_ADAPTER_SERVICE_NAME_LEN        32
> > > > > >  #define ETH_RX_ADAPTER_MEM_NAME_LEN    32
> > > > > >
> > > > > > @@ -1554,7 +1556,7 @@ rxa_default_conf_cb(uint8_t id, uint8_t
> > > dev_id,
> > > > > >         }
> > > > > >
> > > > > >         conf->event_port_id = port_id;
> > > > > > -       conf->max_nb_rx = 128;
> > > > > > +       conf->max_nb_rx = RXA_NB_RX_WORK_DEFAULT;
> > > > > >         if (started)
> > > > > >                 ret = rte_event_dev_start(dev_id);
> > > > > >         rx_adapter->default_cb_arg = 1; @@ -3436,6 +3438,90 @@
> > > > > > rte_event_eth_rx_adapter_instance_get(uint16_t eth_dev_id,
> > > > > >         return -EINVAL;
> > > > > >  }
> > > > >
> > > > > > +
> > > > > > +int
> > > > > > +rte_event_eth_rx_adapter_runtime_params_set(uint8_t id,
> > > > > > +               struct rte_event_eth_rx_adapter_runtime_params
> > > > > > +*params) {
> > > > > > +       struct event_eth_rx_adapter *rxa;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       if (params == NULL)
> > > > > > +               return -EINVAL;
> > > > > > +
> > > > > > +       if (rxa_memzone_lookup())
> > > > > > +               return -ENOMEM;
> > > > >
> > > > > Introduce an adapter callback and move SW adapter related logic
> > > > > under callback handler.
> > > > >
> > > > >
> > > > Do you mean introducing eventdev PMD callback for HW
> implementation?
> > >
> > > Yes.
> > >
> >
> > Can this be taken care as and when the HW implementation is required?
> 
> OK. As long as when case INTERNAL PORT it return -ENOSUP now.
> 
> >
> > > > There are no adapter callback currently for service based SW
> > > Implementation.
> > > >
> > > > > > +
> > > > > > +       rxa = rxa_id_to_adapter(id);
> > > > > > +       if (rxa == NULL)
> > > > > > +               return -EINVAL;
> > > > > > +
> > > > > > +       ret = rxa_caps_check(rxa);
> > > > > > +       if (ret)
> > > > > > +               return ret;
> > > > > > +
> > > > > > +       rte_spinlock_lock(&rxa->rx_lock);
> > > > > > +       rxa->max_nb_rx = params->max_nb_rx;
> > > > > > +       rte_spinlock_unlock(&rxa->rx_lock);
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > > +int
> > > > > > +rte_event_eth_rx_adapter_runtime_params_get(uint8_t id,
> > > > > > +               struct rte_event_eth_rx_adapter_runtime_params
> > > > > > +*params) {
> > > > > > +       struct event_eth_rx_adapter *rxa;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       if (params == NULL)
> > > > > > +               return -EINVAL;
> > > > >
> > > > >
> > > > > Introduce an adapter callback and move SW adapter related logic
> > > > > under callback handler.
> > > > >
> > > > >
> > > >
> > > > Same as above
> > > >
> > > > > > +
> > > > > > +       if (rxa_memzone_lookup())
> > > > > > +               return -ENOMEM;
> > > > >  +
> > > > > > +       rxa = rxa_id_to_adapter(id);
> > > > > > +       if (rxa == NULL)
> > > > > > +               return -EINVAL;
> > > > > > +
> > > > > > +       ret = rxa_caps_check(rxa);
> > > > > > +       if (ret)
> > > > > > +               return ret;
> > > > > > +
> > > > > > +       params->max_nb_rx = rxa->max_nb_rx;
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* RX-adapter telemetry callbacks */
> > > > > >  #define RXA_ADD_DICT(stats, s) rte_tel_data_add_dict_u64(d,
> > > > > > #s,
> > > > > > stats.s)
> > > > > >
> > > > > >  static int
> > > > > > diff --git a/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > > b/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > > index f4652f40e8..214ffd018c 100644
> > > > > > --- a/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > > +++ b/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > > @@ -39,10 +39,21 @@
> > > > > >   *  - rte_event_eth_rx_adapter_queue_stats_reset()
> > > > > >   *  - rte_event_eth_rx_adapter_event_port_get()
> > > > > >   *  - rte_event_eth_rx_adapter_instance_get()
> > > > > > + *  - rte_event_eth_rx_adapter_runtime_params_get()
> > > > > > + *  - rte_event_eth_rx_adapter_runtime_params_set()
> > > > > >   *
> > > > > >   * The application creates an ethernet to event adapter using
> > > > > >   * rte_event_eth_rx_adapter_create_ext() or
> > > > > rte_event_eth_rx_adapter_create()
> > > > > >   * or rte_event_eth_rx_adapter_create_with_params() functions.
> > > > > > + *
> > > > > > + * rte_event_eth_rx_adapter_create() or
> > > > > > + rte_event_eth_adapter_create_with_params()
> > > > > > + * configures the adapter with default value of maximum
> > > > > > + packets processed per
> > > > > > + * iteration to RXA_NB_RX_WORK_DEFAULT(128).
> > > > > > + * rte_event_eth_rx_adapter_runtime_params_set() allows to
> > > > > > + re-configure maximum
> > > > > > + * packets processed per iteration. This is alternative to
> > > > > > + using
> > > > > > + * rte_event_eth_rx_adapter_create_ext() with parameter
> > > > > > + * rte_event_eth_rx_adapter_conf::max_nb_rx
> > > > >
> > > > > Move this to Doxygen comment against max_nb_rx
> > > > >
> > > > > > + *
> > > > > >   * The adapter needs to know which ethernet rx queues to poll
> > > > > > for mbufs as
> > > > > well
> > > > > >   * as event device parameters such as the event queue
> > > > > > identifier,
> > > event
> > > > > >   * priority and scheduling type that the adapter should use
> > > > > > when constructing @@ -299,6 +310,19 @@ struct
> > > > > rte_event_eth_rx_adapter_params {
> > > > > >         /**< flag to indicate that event buffer is separate
> > > > > > for each queue */  };
> > > > > >
> > > > > > +/**
> > > > > > + * Adapter configuration parameters  */ struct
> > > > > > +rte_event_eth_rx_adapter_runtime_params {
> > > > > > +       uint32_t max_nb_rx;
> > > > > > +       /**< The adapter can return early if it has processed at least
> > > > > > +        * max_nb_rx mbufs. This isn't treated as a
> > > > > > +requirement; batching
> > > may
> > > > > > +        * cause the adapter to process more than max_nb_rx mbufs.
> > > > >
> > > > > Also tell it is valid only for INTERNAL PORT capablity is set.
> > > > >
> > > >
> > > > Do you mean, it is valid only for INTERNAL PORT capability is 'not' set?
> > >
> > > Yes.
> > >
> > > >
> > > > > > +        */
> > > > > > +       uint32_t rsvd[15];
> > > > > > +       /**< Reserved fields for future use */
> > > > >
> > > > > Introduce rte_event_eth_rx_adapter_runtime_params_init() to
> make
> > > > > sure rsvd is zero.
> > > > >
> > > >
> > > > The reserved fields are not used by the adapter or application.
> > > > Not sure Is it necessary to Introduce a new API to clear reserved fields.
> > >
> > > When adapter starts using new fileds(when we add new fieds in
> > > future), the old applicaiton which is not using
> > > rte_event_eth_rx_adapter_runtime_params_init() may have junk value
> > > and then adapter implementation will behave bad.
> > >
> > >
> >
> > does it mean, the application doesn't re-compile for the new DPDK?
> 
> Yes. No need recompile if ABI not breaking.
> 
> > When some of the reserved fields are used in the future, the application
> also may need to be recompiled along with DPDK right?
> > As the application also may need to use the newly consumed reserved
> fields?
> 
> The problematic case is:
> 
> Adapter implementation of 23.07(Assuming there is change params) field
> needs to work with application of 23.03.
> rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> 

As rte_event_eth_rx_adapter_runtime_params_init() initializes only reserved fields to zero,  it may not solve the issue in this case.
The old application only tries to set/get previous valid fields and the newly used fields may still contain junk value.
If the application wants to make use of any the newly used params, the application changes are required anyway.

> >
> > > >
> > > > > > +};
> > > > > > +
> > > > > >  /**
> > > > > >   *
> > > > > >   * Callback function invoked by the SW adapter before it
> > > > > > continues @@
> > > > > > -377,7 +401,7 @@ int
> > > > > > rte_event_eth_rx_adapter_create_ext(uint8_t
> > > > > > id,
> > > > > uint8_t dev_id,
> > > > > >   * Create a new ethernet Rx event adapter with the specified
> > > identifier.
> > > > > >   * This function uses an internal configuration function that
> > > > > > creates an
> > > event
> > > > > >   * port. This default function reconfigures the event device
> > > > > > with an
> > > > > > - * additional event port and setups up the event port using
> > > > > > the port_config
> > > > > > + * additional event port and setup the event port using the
> > > > > > + port_config
> > > > > >   * parameter passed into this function. In case the
> > > > > > application needs
> > > more
> > > > > >   * control in configuration of the service, it should use the
> > > > > >   * rte_event_eth_rx_adapter_create_ext() version.
> > > > > > @@ -743,6 +767,50 @@
> > > > > > rte_event_eth_rx_adapter_instance_get(uint16_t
> > > > > eth_dev_id,
> > > > > >                                       uint16_t rx_queue_id,
> > > > > >                                       uint8_t *rxa_inst_id);
> > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [EXT] Re: [PATCH v6 1/6] eal: trace: add trace point emit for blob
  @ 2023-01-25 16:09  2%         ` Ferruh Yigit
  2023-01-30 13:35  0%           ` Ankur Dwivedi
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-01-25 16:09 UTC (permalink / raw)
  To: Ankur Dwivedi, dev
  Cc: thomas, david.marchand, mdr, orika, chas3, humin29, linville,
	ciara.loftus, qi.z.zhang, mw, mk, shaibran, evgenys, igorch,
	chandu, Igor Russkikh, shepard.siegel, ed.czeck, john.miller,
	ajit.khaparde, somnath.kotur, Jerin Jacob Kollanukkaran,
	Maciej Czekaj [C],
	Shijith Thotton, Srisivasubramanian Srinivasan, Harman Kalra,
	rahul.lakkireddy, johndale, hyonkim, liudongdong3, yisen.zhuang,
	xuanziyang2, cloud.wangxiaoyun, zhouguoyang, simei.su,
	wenjun1.wu, qiming.yang, Yuying.Zhang, beilei.xing, xiao.w.wang,
	jingjing.wu, junfeng.guo, rosen.xu, Nithin Kumar Dabilpuram,
	Kiran Kumar Kokkilagadda, Sunil Kumar Kori,
	Satha Koteswara Rao Kottidi, Liron Himi, zr, Radha Chintakuntla,
	Veerasenareddy Burru, Sathesh B Edara, matan, viacheslavo,
	longli, spinler, chaoyong.he, niklas.soderlund, hemant.agrawal,
	sachin.saxena, g.singh, apeksha.gupta, sachin.saxena, aboyer,
	Rasesh Mody, Shahed Shaikh, Devendra Singh Rawat,
	andrew.rybchenko, jiawenwu, jianwang, jbehrens, maxime.coquelin,
	chenbo.xia, steven.webster, matt.peters, bruce.richardson,
	mtetsuyah, grive, jasvinder.singh, cristian.dumitrescu, jgrajcia,
	mb

On 1/25/2023 3:02 PM, Ankur Dwivedi wrote:
> 
>>
>> ----------------------------------------------------------------------
>> On 1/20/2023 8:40 AM, Ankur Dwivedi wrote:
>>> Adds a trace point emit function for capturing a blob. The blob
>>> captures the length passed by the application followed by the array.
>>>
>>> The maximum blob bytes which can be captured is bounded by
>>> RTE_TRACE_BLOB_LEN_MAX macro. The value for max blob length macro is
>>> 64 bytes. If the length is less than 64 the remaining trailing bytes
>>> are set to zero.
>>>
>>> This patch also adds test case for emit blob tracepoint function.
>>>
>>> Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
>>> ---
>>>  app/test/test_trace.c                      | 11 ++++++++
>>>  doc/guides/prog_guide/trace_lib.rst        | 12 ++++++++
>>>  lib/eal/common/eal_common_trace_points.c   |  2 ++
>>>  lib/eal/include/rte_eal_trace.h            |  6 ++++
>>>  lib/eal/include/rte_trace_point.h          | 32 ++++++++++++++++++++++
>>>  lib/eal/include/rte_trace_point_register.h |  9 ++++++
>>>  lib/eal/version.map                        |  3 ++
>>>  7 files changed, 75 insertions(+)
>>>
>>> diff --git a/app/test/test_trace.c b/app/test/test_trace.c index
>>> 6bedf14024..ad4a394a29 100644
>>> --- a/app/test/test_trace.c
>>> +++ b/app/test/test_trace.c
>>> @@ -4,6 +4,7 @@
>>>
>>>  #include <rte_eal_trace.h>
>>>  #include <rte_lcore.h>
>>> +#include <rte_random.h>
>>>  #include <rte_trace.h>
>>>
>>>  #include "test.h"
>>> @@ -177,7 +178,12 @@ test_fp_trace_points(void)  static int
>>>  test_generic_trace_points(void)
>>>  {
>>> +	uint8_t arr[RTE_TRACE_BLOB_LEN_MAX];
>>>  	int tmp;
>>> +	int i;
>>> +
>>> +	for (i = 0; i < RTE_TRACE_BLOB_LEN_MAX; i++)
>>> +		arr[i] = i;
>>>
>>>  	rte_eal_trace_generic_void();
>>>  	rte_eal_trace_generic_u64(0x10000000000000);
>>> @@ -195,6 +201,11 @@ test_generic_trace_points(void)
>>>  	rte_eal_trace_generic_ptr(&tmp);
>>>  	rte_eal_trace_generic_str("my string");
>>>  	rte_eal_trace_generic_size_t(sizeof(void *));
>>> +	rte_eal_trace_generic_blob(arr, 0);
>>> +	rte_eal_trace_generic_blob(arr, 17);
>>> +	rte_eal_trace_generic_blob(arr, RTE_TRACE_BLOB_LEN_MAX);
>>> +	rte_eal_trace_generic_blob(arr, rte_rand() %
>>> +					RTE_TRACE_BLOB_LEN_MAX);
>>>  	RTE_EAL_TRACE_GENERIC_FUNC;
>>>
>>>  	return TEST_SUCCESS;
>>> diff --git a/doc/guides/prog_guide/trace_lib.rst
>>> b/doc/guides/prog_guide/trace_lib.rst
>>> index 9a8f38073d..3e0ea5835c 100644
>>> --- a/doc/guides/prog_guide/trace_lib.rst
>>> +++ b/doc/guides/prog_guide/trace_lib.rst
>>> @@ -352,3 +352,15 @@ event ID.
>>>  The ``packet.header`` and ``packet.context`` will be written in the
>>> slow path  at the time of trace memory creation. The ``trace.header``
>>> and trace payload  will be emitted when the tracepoint function is invoked.
>>> +
>>> +Limitations
>>> +-----------
>>> +
>>> +- The ``rte_trace_point_emit_blob()`` function can capture a maximum
>>> +blob of
>>> +  length ``RTE_TRACE_BLOB_LEN_MAX`` bytes. The application can call
>>> +  ``rte_trace_point_emit_blob()`` multiple times with length less
>>> +than or equal to
>>> +  ``RTE_TRACE_BLOB_LEN_MAX``, if it needs to capture more than
>>> +``RTE_TRACE_BLOB_LEN_MAX``
>>> +  bytes.
>>> +- If the length passed to the ``rte_trace_point_emit_blob()`` is less
>>> +than
>>> +  ``RTE_TRACE_BLOB_LEN_MAX``, then the trailing
>>> +``(RTE_TRACE_BLOB_LEN_MAX - len)``
>>> +  bytes in the trace are set to zero.
>>> diff --git a/lib/eal/common/eal_common_trace_points.c
>>> b/lib/eal/common/eal_common_trace_points.c
>>> index 0b0b254615..051f89809c 100644
>>> --- a/lib/eal/common/eal_common_trace_points.c
>>> +++ b/lib/eal/common/eal_common_trace_points.c
>>> @@ -40,6 +40,8 @@
>> RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_size_t,
>>>  	lib.eal.generic.size_t)
>>>  RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_func,
>>>  	lib.eal.generic.func)
>>> +RTE_TRACE_POINT_REGISTER(rte_eal_trace_generic_blob,
>>> +	lib.eal.generic.blob)
>>>
>>>  RTE_TRACE_POINT_REGISTER(rte_eal_trace_alarm_set,
>>>  	lib.eal.alarm.set)
>>> diff --git a/lib/eal/include/rte_eal_trace.h
>>> b/lib/eal/include/rte_eal_trace.h index 5ef4398230..e0b836eb2f 100644
>>> --- a/lib/eal/include/rte_eal_trace.h
>>> +++ b/lib/eal/include/rte_eal_trace.h
>>> @@ -143,6 +143,12 @@ RTE_TRACE_POINT(
>>>  	rte_trace_point_emit_string(func);
>>>  )
>>>
>>> +RTE_TRACE_POINT(
>>> +	rte_eal_trace_generic_blob,
>>> +	RTE_TRACE_POINT_ARGS(void *in, uint8_t len),
>>> +	rte_trace_point_emit_blob(in, len);
>>> +)
>>> +
>>>  #define RTE_EAL_TRACE_GENERIC_FUNC
>>> rte_eal_trace_generic_func(__func__)
>>>
>>>  /* Interrupt */
>>> diff --git a/lib/eal/include/rte_trace_point.h
>>> b/lib/eal/include/rte_trace_point.h
>>> index 0f8700974f..aca8344dbf 100644
>>> --- a/lib/eal/include/rte_trace_point.h
>>> +++ b/lib/eal/include/rte_trace_point.h
>>> @@ -144,6 +144,16 @@ _tp _args \
>>>  #define rte_trace_point_emit_ptr(val)
>>>  /** Tracepoint function payload for string datatype */  #define
>>> rte_trace_point_emit_string(val)
>>> +/**
>>> + * Tracepoint function to capture a blob.
>>> + *
>>> + * @param val
>>> + *   Pointer to the array to be captured.
>>> + * @param len
>>> + *   Length to be captured. The maximum supported length is
>>> + *   RTE_TRACE_BLOB_LEN_MAX bytes.
>>> + */
>>> +#define rte_trace_point_emit_blob(val, len)
>>>
>>
>> This is just for doxygen right, why doxygen comments are not above the actual
>> macros but there is a separate #if block for it?
> 
> The actual macro is within a #ifndef __DOXYGEN__ block. I think that is the reason for including
> Doxygen comments here.

Thanks for confirming.

Why comments are not as part of actual macro, but there is a separate
'#ifdef __DOXYGEN__' block?

>>
>>>  #endif /* __DOXYGEN__ */
>>>
>>> @@ -152,6 +162,9 @@ _tp _args \
>>>  /** @internal Macro to define event header size. */  #define
>>> __RTE_TRACE_EVENT_HEADER_SZ sizeof(uint64_t)
>>>
>>> +/** Macro to define maximum emit length of blob. */ #define
>>> +RTE_TRACE_BLOB_LEN_MAX 64
>>> +
>>>  /**
>>>   * Enable recording events of the given tracepoint in the trace buffer.
>>>   *
>>> @@ -374,12 +387,31 @@ do { \
>>>  	mem = RTE_PTR_ADD(mem, __RTE_TRACE_EMIT_STRING_LEN_MAX);
>> \  } while
>>> (0)
>>>
>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>> +	if (unlikely(in == NULL)) \
>>> +		return; \
>>> +	if (len > RTE_TRACE_BLOB_LEN_MAX) \
>>> +		len = RTE_TRACE_BLOB_LEN_MAX; \
>>> +	__rte_trace_point_emit(len, uint8_t); \
>>> +	memcpy(mem, in, len); \
>>> +	mem = RTE_PTR_ADD(mem, len); \
>>> +	memset(mem, 0, RTE_TRACE_BLOB_LEN_MAX - len); \
>>> +	mem = RTE_PTR_ADD(mem, RTE_TRACE_BLOB_LEN_MAX - len); \
>>
>>
>> Is first memset later memcpy not done because of performance concerns?
> 
> The memset sets to 0 the unused bytes (RTE_TRACE_BLOB_LEN_MAX - len). So memset is done after memcpy.

yep, I can see what is done.

Question is, you can do more simply:
memset(mem, 0, RTE_TRACE_BLOB_LEN_MAX);
memcpy(mem, in, len);
mem = RTE_PTR_ADD(mem, RTE_TRACE_BLOB_LEN_MAX - len);

Why did you prefer the implementation you did, intentionally? If so what
is the intention, performance concerns?

btw, I want to remind that size of the 'len' can be max 64 bytes.

>>
>>> +} while (0)
>>> +
>>>  #else
>>>
>>>  #define __rte_trace_point_emit_header_generic(t) RTE_SET_USED(t)
>>> #define __rte_trace_point_emit_header_fp(t) RTE_SET_USED(t)  #define
>>> __rte_trace_point_emit(in, type) RTE_SET_USED(in)  #define
>>> rte_trace_point_emit_string(in) RTE_SET_USED(in)
>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>> +	RTE_SET_USED(in); \
>>> +	RTE_SET_USED(len); \
>>> +} while (0)
>>> +
>>>
>>>  #endif /* ALLOW_EXPERIMENTAL_API */
>>>  #endif /* _RTE_TRACE_POINT_REGISTER_H_ */ diff --git
>>> a/lib/eal/include/rte_trace_point_register.h
>>> b/lib/eal/include/rte_trace_point_register.h
>>> index a32f4d731b..7efbac8a72 100644
>>> --- a/lib/eal/include/rte_trace_point_register.h
>>> +++ b/lib/eal/include/rte_trace_point_register.h
>>> @@ -47,6 +47,15 @@ do { \
>>>  		RTE_STR(in)"[32]", "string_bounded_t"); \  } while (0)
>>>
>>> +#define rte_trace_point_emit_blob(in, len) \ do { \
>>> +	RTE_SET_USED(in); \
>>> +	__rte_trace_point_emit(len, uint8_t); \
>>> +	__rte_trace_point_emit_field(RTE_TRACE_BLOB_LEN_MAX, \
>>> +		RTE_STR(in)"["RTE_STR(RTE_TRACE_BLOB_LEN_MAX)"]", \
>>> +		RTE_STR(uint8_t)); \
>>> +} while (0)
>>> +
>>
>> Why this macro defined here again, it is also defined in 'rte_trace_point.h'
>> already?
>> Is it because of 'register_fn()' in '__rte_trace_point_register()'?
> 
> Yes the register happens in this function.

You are not really answering questions.

There are three copy of '#define rte_trace_point_emit_blob(in, len)' one
of them is for doxygen comment, please explain why there are two more
copies of it?

>>
>>>  #ifdef __cplusplus
>>>  }
>>>  #endif
>>> diff --git a/lib/eal/version.map b/lib/eal/version.map index
>>> 7ad12a7dc9..67be24686a 100644
>>> --- a/lib/eal/version.map
>>> +++ b/lib/eal/version.map
>>> @@ -440,6 +440,9 @@ EXPERIMENTAL {
>>>  	rte_thread_detach;
>>>  	rte_thread_equal;
>>>  	rte_thread_join;
>>> +
>>> +	# added in 23.03
>>> +	__rte_eal_trace_generic_blob;
>>
>> This is not a function but a trace object.
>> I guess it was agreed that trace object not need to be exported, and trace can
>> be found by name?
> 
> Yes the export in version.map can be removed. Will remove it in next patch series.

ack.

Will there be a separate patch to remove existing symbols? Although I am
not sure if it will be ABI break.


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  @ 2023-01-25 10:38  3%           ` Jerin Jacob
  2023-01-25 16:32  0%             ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-01-25 10:38 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Wed, Jan 25, 2023 at 3:22 PM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, January 25, 2023 9:42 AM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Tue, Jan 24, 2023 at 6:37 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Tuesday, January 24, 2023 10:00 AM
> > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > <jay.jayatheerthan@intel.com>
> > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > > >
> > > > On Mon, Jan 23, 2023 at 11:35 PM Naga Harish K S V
> > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > >
> > > > > The adapter configuration parameters defined in the ``struct
> > > > > rte_event_eth_rx_adapter_runtime_params`` can be configured and
> > > > > retrieved using ``rte_event_eth_rx_adapter_runtime_params_set``
> > > > > and ``rte_event_eth_tx_adapter_runtime_params_get`` respectively.
> > > > >
> > > > > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > > >
> > > > > diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > index 461eca566f..2207d6ffc3 100644
> > > > > --- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > +++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > > > > @@ -185,6 +185,26 @@ flags for handling received packets, event
> > > > > queue identifier, scheduler type,  event priority, polling
> > > > > frequency of the receive queue and flow identifier  in struct
> > > > ``rte_event_eth_rx_adapter_queue_conf``.
> > > > >
> > > > > +Set/Get adapter runtime configuration parameters
> > > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > +
> > > > > +The runtime configuration parameters of adapter can be set/read
> > > > > +using ``rte_event_eth_rx_adapter_runtime_params_set()`` and
> > > > > +``rte_event_eth_rx_adapter_runtime_params_get()`` respectively.
> > > > > +The parameters that can be set/read are defined in ``struct
> > > > rte_event_eth_rx_adapter_runtime_params``.
> > > >
> > > > Good.
> > > >
> > > > > +
> > > > > +``rte_event_eth_rx_adapter_create()`` or
> > > > > +``rte_event_eth_rx_adapter_create_with_params()`` configures the
> > > > > +adapter with default value for maximum packets processed per
> > > > > +request to
> > > > 128.
> > > > > +``rte_event_eth_rx_adapter_runtime_params_set()`` function allows
> > > > > +to reconfigure maximum number of packets processed by adapter per
> > > > > +service request. This is alternative to configuring the maximum
> > > > > +packets processed per request by adapter by using
> > > > > +``rte_event_eth_rx_adapter_create_ext()`` with parameter
> > > > ``rte_event_eth_rx_adapter_conf::max_nb_rx``.
> > > >
> > > > This paragraph is not needed IMO. As it is specific to a driver, and
> > > > we can keep Doxygen comment only.
> > > >
> > > >
> > > > > +
> > > > > +``rte_event_eth_rx_adapter_runtime_parmas_get()`` function
> > > > > +retrieves the configuration parameters that are defined in
> > > > > +``struct
> > > > rte_event_eth_rx_adapter_runtime_params``.
> > > >
> > > > Good.
> > > >
> > > > > +
> > > > >  Getting and resetting Adapter queue stats
> > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > >
> > > > > diff --git a/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > b/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > index 34aa87379e..d8f3e750b7 100644
> > > > > --- a/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > +++ b/lib/eventdev/rte_event_eth_rx_adapter.c
> > > > > @@ -35,6 +35,8 @@
> > > > >  #define MAX_VECTOR_NS          1E9
> > > > >  #define MIN_VECTOR_NS          1E5
> > > > >
> > > > > +#define RXA_NB_RX_WORK_DEFAULT 128
> > > > > +
> > > > >  #define ETH_RX_ADAPTER_SERVICE_NAME_LEN        32
> > > > >  #define ETH_RX_ADAPTER_MEM_NAME_LEN    32
> > > > >
> > > > > @@ -1554,7 +1556,7 @@ rxa_default_conf_cb(uint8_t id, uint8_t
> > dev_id,
> > > > >         }
> > > > >
> > > > >         conf->event_port_id = port_id;
> > > > > -       conf->max_nb_rx = 128;
> > > > > +       conf->max_nb_rx = RXA_NB_RX_WORK_DEFAULT;
> > > > >         if (started)
> > > > >                 ret = rte_event_dev_start(dev_id);
> > > > >         rx_adapter->default_cb_arg = 1; @@ -3436,6 +3438,90 @@
> > > > > rte_event_eth_rx_adapter_instance_get(uint16_t eth_dev_id,
> > > > >         return -EINVAL;
> > > > >  }
> > > >
> > > > > +
> > > > > +int
> > > > > +rte_event_eth_rx_adapter_runtime_params_set(uint8_t id,
> > > > > +               struct rte_event_eth_rx_adapter_runtime_params
> > > > > +*params) {
> > > > > +       struct event_eth_rx_adapter *rxa;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (params == NULL)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       if (rxa_memzone_lookup())
> > > > > +               return -ENOMEM;
> > > >
> > > > Introduce an adapter callback and move SW adapter related logic
> > > > under callback handler.
> > > >
> > > >
> > > Do you mean introducing eventdev PMD callback for HW implementation?
> >
> > Yes.
> >
>
> Can this be taken care as and when the HW implementation is required?

OK. As long as when case INTERNAL PORT it return -ENOSUP now.

>
> > > There are no adapter callback currently for service based SW
> > Implementation.
> > >
> > > > > +
> > > > > +       rxa = rxa_id_to_adapter(id);
> > > > > +       if (rxa == NULL)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       ret = rxa_caps_check(rxa);
> > > > > +       if (ret)
> > > > > +               return ret;
> > > > > +
> > > > > +       rte_spinlock_lock(&rxa->rx_lock);
> > > > > +       rxa->max_nb_rx = params->max_nb_rx;
> > > > > +       rte_spinlock_unlock(&rxa->rx_lock);
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_event_eth_rx_adapter_runtime_params_get(uint8_t id,
> > > > > +               struct rte_event_eth_rx_adapter_runtime_params
> > > > > +*params) {
> > > > > +       struct event_eth_rx_adapter *rxa;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (params == NULL)
> > > > > +               return -EINVAL;
> > > >
> > > >
> > > > Introduce an adapter callback and move SW adapter related logic
> > > > under callback handler.
> > > >
> > > >
> > >
> > > Same as above
> > >
> > > > > +
> > > > > +       if (rxa_memzone_lookup())
> > > > > +               return -ENOMEM;
> > > >  +
> > > > > +       rxa = rxa_id_to_adapter(id);
> > > > > +       if (rxa == NULL)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       ret = rxa_caps_check(rxa);
> > > > > +       if (ret)
> > > > > +               return ret;
> > > > > +
> > > > > +       params->max_nb_rx = rxa->max_nb_rx;
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +/* RX-adapter telemetry callbacks */
> > > > >  #define RXA_ADD_DICT(stats, s) rte_tel_data_add_dict_u64(d, #s,
> > > > > stats.s)
> > > > >
> > > > >  static int
> > > > > diff --git a/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > b/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > index f4652f40e8..214ffd018c 100644
> > > > > --- a/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > +++ b/lib/eventdev/rte_event_eth_rx_adapter.h
> > > > > @@ -39,10 +39,21 @@
> > > > >   *  - rte_event_eth_rx_adapter_queue_stats_reset()
> > > > >   *  - rte_event_eth_rx_adapter_event_port_get()
> > > > >   *  - rte_event_eth_rx_adapter_instance_get()
> > > > > + *  - rte_event_eth_rx_adapter_runtime_params_get()
> > > > > + *  - rte_event_eth_rx_adapter_runtime_params_set()
> > > > >   *
> > > > >   * The application creates an ethernet to event adapter using
> > > > >   * rte_event_eth_rx_adapter_create_ext() or
> > > > rte_event_eth_rx_adapter_create()
> > > > >   * or rte_event_eth_rx_adapter_create_with_params() functions.
> > > > > + *
> > > > > + * rte_event_eth_rx_adapter_create() or
> > > > > + rte_event_eth_adapter_create_with_params()
> > > > > + * configures the adapter with default value of maximum packets
> > > > > + processed per
> > > > > + * iteration to RXA_NB_RX_WORK_DEFAULT(128).
> > > > > + * rte_event_eth_rx_adapter_runtime_params_set() allows to
> > > > > + re-configure maximum
> > > > > + * packets processed per iteration. This is alternative to using
> > > > > + * rte_event_eth_rx_adapter_create_ext() with parameter
> > > > > + * rte_event_eth_rx_adapter_conf::max_nb_rx
> > > >
> > > > Move this to Doxygen comment against max_nb_rx
> > > >
> > > > > + *
> > > > >   * The adapter needs to know which ethernet rx queues to poll for
> > > > > mbufs as
> > > > well
> > > > >   * as event device parameters such as the event queue identifier,
> > event
> > > > >   * priority and scheduling type that the adapter should use when
> > > > > constructing @@ -299,6 +310,19 @@ struct
> > > > rte_event_eth_rx_adapter_params {
> > > > >         /**< flag to indicate that event buffer is separate for
> > > > > each queue */  };
> > > > >
> > > > > +/**
> > > > > + * Adapter configuration parameters  */ struct
> > > > > +rte_event_eth_rx_adapter_runtime_params {
> > > > > +       uint32_t max_nb_rx;
> > > > > +       /**< The adapter can return early if it has processed at least
> > > > > +        * max_nb_rx mbufs. This isn't treated as a requirement; batching
> > may
> > > > > +        * cause the adapter to process more than max_nb_rx mbufs.
> > > >
> > > > Also tell it is valid only for INTERNAL PORT capablity is set.
> > > >
> > >
> > > Do you mean, it is valid only for INTERNAL PORT capability is 'not' set?
> >
> > Yes.
> >
> > >
> > > > > +        */
> > > > > +       uint32_t rsvd[15];
> > > > > +       /**< Reserved fields for future use */
> > > >
> > > > Introduce rte_event_eth_rx_adapter_runtime_params_init() to make
> > > > sure rsvd is zero.
> > > >
> > >
> > > The reserved fields are not used by the adapter or application. Not
> > > sure Is it necessary to Introduce a new API to clear reserved fields.
> >
> > When adapter starts using new fileds(when we add new fieds in future), the
> > old applicaiton which is not using
> > rte_event_eth_rx_adapter_runtime_params_init() may have junk value and
> > then adapter implementation will behave bad.
> >
> >
>
> does it mean, the application doesn't re-compile for the new DPDK?

Yes. No need recompile if ABI not breaking.

> When some of the reserved fields are used in the future, the application also may need to be recompiled along with DPDK right?
> As the application also may need to use the newly consumed reserved fields?

The problematic case is:

Adapter implementation of 23.07(Assuming there is change params) field
needs to work with application of 23.03.
rte_event_eth_rx_adapter_runtime_params_init() will sove that.

>
> > >
> > > > > +};
> > > > > +
> > > > >  /**
> > > > >   *
> > > > >   * Callback function invoked by the SW adapter before it
> > > > > continues @@
> > > > > -377,7 +401,7 @@ int rte_event_eth_rx_adapter_create_ext(uint8_t
> > > > > id,
> > > > uint8_t dev_id,
> > > > >   * Create a new ethernet Rx event adapter with the specified
> > identifier.
> > > > >   * This function uses an internal configuration function that creates an
> > event
> > > > >   * port. This default function reconfigures the event device with
> > > > > an
> > > > > - * additional event port and setups up the event port using the
> > > > > port_config
> > > > > + * additional event port and setup the event port using the
> > > > > + port_config
> > > > >   * parameter passed into this function. In case the application needs
> > more
> > > > >   * control in configuration of the service, it should use the
> > > > >   * rte_event_eth_rx_adapter_create_ext() version.
> > > > > @@ -743,6 +767,50 @@
> > > > > rte_event_eth_rx_adapter_instance_get(uint16_t
> > > > eth_dev_id,
> > > > >                                       uint16_t rx_queue_id,
> > > > >                                       uint8_t *rxa_inst_id);
> > > > >

^ permalink raw reply	[relevance 3%]

* RE: [RFC 2/5] ethdev: introduce the affinity field in Tx queue API
  @ 2023-01-24 13:32  0%       ` Jiawei(Jonny) Wang
  0 siblings, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-01-24 13:32 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Slava Ovsiienko, Ori Kam, Aman Singh, Yuying Zhang, Ferruh Yigit,
	Andrew Rybchenko, dev, Raslan Darawsheh, jerinj

Hi,

> 18/01/2023 15:44, Jiawei(Jonny) Wang:
> > > 21/12/2022 11:29, Jiawei Wang:
> > > > For the multiple hardware ports connect to a single DPDK port
> > > > (mhpsdp), the previous patch introduces the new rte flow item to
> > > > match the port affinity of the received packets.
> > > >
> > > > This patch adds the tx_affinity setting in Tx queue API, the
> > > > affinity value reflects packets be sent to which hardware port.
> > >
> > > I think "affinity" means we would like packet to be sent on a
> > > specific hardware port, but it is not mandatory.
> > > Is it the meaning you want? Or should it be a mandatory port?
> >
> > Right, it's optional setting not mandatory.
> 
> I think there is a misunderstanding.
> I mean that "affinity" with port 0 may suggest that we try to send to port 0 but
> sometimes the packet will be sent to port 1.
>
> And I think you want the packet to be always sent to port 0 if affinity is 0, right?
>

These packets should be always sent to port 0 if 'affinity' be set with hardware port 0.
'affinity is 0' -> 0 means that no affinity be set and traffic should be kept the same behavior
as before, for example, routing between different hardware ports.
 
> If yes, I think the word "affinity" does not convey the right idea.
> And again, the naming should give the idea that we are talking about multiple
> ports merged in one DPDK port.
> 

OK, how about 'tx_mhpsdp_hwport? 
'mhpsdp' as mentioned before, 'hwport' means for one 'hardware port'.

> > > > Adds the new tx_affinity field into the padding hole of
> > > > rte_eth_txconf structure, the size of rte_eth_txconf keeps the
> > > > same. Adds a suppress type for structure change in the ABI check file.
> > > >
> > > > This patch adds the testpmd command line:
> > > > testpmd> port config (port_id) txq (queue_id) affinity (value)
> > > >
> > > > For example, there're two hardware ports connects to a single DPDK
> > >
> > > connects -> connected
> >
> > OK, will fix in next version.
> >
> > > > port (port id 0), and affinity 1 stood for hard port 1 and
> > > > affinity
> > > > 2 stood for hardware port 2, used the below command to config tx
> > > > affinity for each TxQ:
> > > > 	port config 0 txq 0 affinity 1
> > > > 	port config 0 txq 1 affinity 1
> > > > 	port config 0 txq 2 affinity 2
> > > > 	port config 0 txq 3 affinity 2
> > > >
> > > > These commands config the TxQ index 0 and TxQ index 1 with
> > > > affinity 1, uses TxQ 0 or TxQ 1 send packets, these packets will
> > > > be sent from the hardware port 1, and similar with hardware port 2
> > > > if sending packets with TxQ 2 or TxQ 3.
> > >
> > > [...]
> > > > @@ -212,6 +212,10 @@ API Changes
> > > > +* ethdev: added a new field:
> > > > +
> > > > +  - Tx affinity per-queue ``rte_eth_txconf.tx_affinity``
> > >
> > > Adding a new field is not an API change because existing
> > > applications don't need to update their code if they don't care this new field.
> > > I think you can remove this note.
> >
> > OK, will remove in next version.
> >
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -1138,6 +1138,7 @@ struct rte_eth_txconf {
> > > >  				      less free descriptors than this value. */
> > > >
> > > >  	uint8_t tx_deferred_start; /**< Do not start queue with
> > > > rte_eth_dev_start(). */
> > > > +	uint8_t tx_affinity; /**< Drives the setting of affinity per-queue.
> > > > +*/
> > >
> > > Why "Drives"? It is the setting, right?
> > > rte_eth_txconf is per-queue so no need to repeat.
> > > I think a good comment here would be to mention it is a physical
> > > port index for mhpsdp.
> > > Another good comment would be to specify how ports are numbered.
> >
> > OK, will update the comment for this new setting.
> >
> > Thanks.
> 
> 


^ permalink raw reply	[relevance 0%]

* [PATCH v4 2/3] graph: pcap capture for graph nodes
  2023-01-24 11:21  4% ` [PATCH v4 " Amit Prakash Shukla
@ 2023-01-24 11:21  2%   ` Amit Prakash Shukla
  2023-01-31  8:06  0%     ` Jerin Jacob
  2023-02-03  8:19  4%   ` [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
  1 sibling, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2023-01-24 11:21 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

 app/test/test_graph_perf.c             |   2 +-
 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 217 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 124 ++++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 +
 10 files changed, 397 insertions(+), 4 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 1d065438a6..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -324,7 +324,7 @@ graph_init(const char *gname, uint8_t nb_srcs, uint8_t nb_sinks,
 	char nname[RTE_NODE_NAMESIZE / 2];
 	struct test_node_data *node_data;
 	char *ename[nodes_per_stage];
-	struct rte_graph_param gconf;
+	struct rte_graph_param gconf = {0};
 	const struct rte_memzone *mz;
 	uint8_t total_percent = 0;
 	rte_node_t *src_nodes;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 8c360b89e4..9ba392fb58 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -69,6 +69,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -101,6 +105,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..7bd13ed61e
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <pwd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..198add67e2
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v4 1/3] pcapng: comment option support for epb
  @ 2023-01-24 11:21  4% ` Amit Prakash Shukla
  2023-01-24 11:21  2%   ` [PATCH v4 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
  2023-02-03  8:19  4%   ` [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
  0 siblings, 2 replies; 200+ results
From: Amit Prakash Shukla @ 2023-01-24 11:21 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger; +Cc: dev, jerinj, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index a7acbdc058..303d3d66f9 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -139,7 +139,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -255,7 +255,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index c15f6fbb9f..8c360b89e4 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -98,6 +98,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 80d08e1a3b..acb31a9d93 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -450,7 +450,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -511,6 +512,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -548,6 +552,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 7d2697c647..6d286cda41 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -100,6 +100,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -111,7 +113,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* DPDK Release Status Meeting 2023-01-19
@ 2023-01-24 10:33  3% Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2023-01-24 10:33 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand

[-- Attachment #1: Type: text/plain, Size: 3381 bytes --]

Release status meeting minutes 2023-01-19
=========================================

Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

Participants:
* AMD
* ARM
* Debian/Microsoft [No]
* Intel
* Marvell
* Nvidia
* Red Hat

Release Dates
-------------

The following are the proposed current dates for 23.03:

* V1:      25 December 2022
* RC1:      8 February 2023
* RC2:      1 March    2023
* RC3:      8 March    2023
* Release: 20 March    2023

Subtrees
--------

* next-net
  * Patches being merged but slowly
  * rte_flow has a lot of changes in this release and needs support
  * Some reviews ongoing but we are missing reviewer support on this tree.

* next-net-intel
  * No update.

* next-net-mlx
  * No update.

* next-net-mvl
  * Nearly all patches merged. Ready for pull.

* next-eventdev
  * Nearly all patches merged. Waiting for some review comments

  3 sets of patches from Intel.
  * Some CNXK updates

* next-baseband
  * New tree from this release.
  * Some series merged and some under review.

* next-virtio
  * Not many series under review.
  * Vhost async and VDPA patches need review.

* next-crypto
  * Some patches merged this week.
  * Build patches for QAT
  * PDCP (Packet Data Convergence Protocol) new library targeted for this release
  * Patch for CCP driver from last release

* main
  * Thread API from Tyler under review
  * ABI check merged
  * ML Dev patches need review
  * CNI (Container Network Interface) for the AF_XDP driver.
    * Will update patch to include explanation/docs.
  * Request for Roadmaps

Proposed Schedule for 2023
--------------------------

See also http://core.dpdk.org/roadmap/#dates

23.03
  * Proposal deadline (RFC/v1 patches): 25 December 2022
  * API freeze (-rc1): 8 February 2023
  * PMD features freeze (-rc2): 1 March 2023
  * Builtin applications features freeze (-rc3): 8 March 2023
  * Release: 20 March 2023

23.07
  * Proposal deadline (RFC/v1 patches): 15 April 2023
  * API freeze (-rc1): 31 May 2023
  * PMD features freeze (-rc2): 21 June 2023
  * Builtin applications features freeze (-rc3): 28 June 2023
  * Release: 12 July 2023

23.11
  * Proposal deadline (RFC/v1 patches): 12 August 2023
  * API freeze (-rc1): 29 September 2023
  * PMD features freeze (-rc2): 20 October 2023
  * Builtin applications features freeze (-rc3): 27 October 2023
  * Release: 15 November 2023

Other
-----

* TBA

LTS
---

Next releases will be:

* 22.11.1

* 21.11.4

* 20.11.8

* 19.11.15?
  * CVE and critical fixes only.

* Distros
  * v20.11 in Debian 11
  * Ubuntu 22.04 contains 21.11

Defects
-------

* Bugzilla links, 'Bugs',  added for hosted projects
  * https://www.dpdk.org/hosted-projects/

Opens
-----

* None

DPDK Release Status Meetings
----------------------------

The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursday at 9:30 UTC over Jitsi on https://meet.jit.si/DPDK

You don't need an invite to join the meeting but if you want a calendar reminder just
send an email to "John McNamara john.mcnamara@intel.com" for the invite.

[-- Attachment #2: Type: text/html, Size: 18068 bytes --]

^ permalink raw reply	[relevance 3%]

* RE: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-23  9:31  0%         ` Jerin Jacob
@ 2023-01-23 18:07  0%           ` Naga Harish K, S V
  0 siblings, 0 replies; 200+ results
From: Naga Harish K, S V @ 2023-01-23 18:07 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: jerinj, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, January 23, 2023 3:02 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Fri, Jan 20, 2023 at 4:03 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, January 20, 2023 3:02 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Fri, Jan 20, 2023 at 2:28 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Wednesday, January 18, 2023 3:52 PM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Sat, Jan 7, 2023 at 9:49 PM Naga Harish K S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > > The adapter configuration parameters defined in the ``struct
> > > > > > rte_event_eth_rx_adapter_config_params`` can be configured and
> > > > > > retrieved using ``rte_event_eth_rx_adapter_set_params`` and
> > > > > > ``rte_event_eth_tx_adapter_get_params`` respectively.
> > > > > >
> > > > > > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > > > > > ---
> > > > > >
> > > > > > +/**
> > > > > > + * Adapter configuration parameters  */ struct
> > > > > > +rte_event_eth_rx_adapter_runtime_params {
> > > > > > +       uint32_t max_nb_rx;
> > > > > > +       /**< The adapter can return early if it has processed at least
> > > > > > +        * max_nb_rx mbufs. This isn't treated as a
> > > > > > +requirement; batching
> > > may
> > > > > > +        * cause the adapter to process more than max_nb_rx mbufs.
> > > > >
> > > > > This parameter as specific to SW only driver. So future
> > > > > something added from HW only driver item then it won't work for
> > > > > SW driver. So we need capability per adapter.
> > > > >
> > > > > So,  I would suggest following theme across _all_ adapters.
> > > > >
> > > > > 1) Introduce RTE_EVENT_ETH_RX_ADAPTER_CAP_RUNTIME_XYZ and
> > > associate
> > > > > each parameter(the one we think, it is not common for all
> > > > > adapters)
> > > >
> > > > The parameters that are exposed in the patch are all existing
> > > > parameters and they are made runtime configurable for SW
> > > > implementation. I think, there are no such parameters existing
> > > > today for HW driver implementation. Hence it may be better to
> > > > introduce these
> > > flags when the HW driver Implementation requires runtime
> > > configurable parameters.
> > >
> > > Since current values are not applicable to HW. So we any way need
> > > the capability now to tell this is not applicable for HW.
> > >
> >
> > Depending on the existing adapter capability flag
> > "RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT",
> > the current values can be applied to only SW implementation. In this
> > way, there is no need for creating new capability flags.
> 
> OK. Makes sense. Please send next version with remaining suggestions.
> 

V2 version of the patch  is posted with remaining suggestions.

> >
> > > >
> > > > > 2) Add some reserved fields in
> > > > > rte_event_eth_rx_adapter_runtime_params
> > > > > so that we don't break ABI in future
> > > >
> > > > Agreed.
> > > >
> > > > > 3) Add rte_event_eth_rx_adapter_runtime_params_init() function
> > > > > just make structure fill with default to avoid ABI break
> > > > > 4) Add rte_event_eth_rx_adapter_runtime_params_info_get(). Lets
> > > > > capability flags and other items can be return via this
> > > >
> > > > These two items(3,4) can be taken as and when item "1" above is
> > > implemented.
> > >
> > > See above.
> > >
> > > >
> > > > > 5) Change rte_event_eth_rx_adapter_set_params as
> > > > > rte_event_eth_rx_adapter_runtime_set()  or
> > > > > rte_event_eth_rx_adapter_runtime_params_set() to make it
> runtime
> > > > > explicit
> > > >
> > > > Agreed
> > > >
> > > > > 6) Change rte_event_eth_rx_adapter_get_params as
> > > > > rte_event_eth_rx_adapter_runtime_get() or
> > > > > rte_event_eth_rx_adapter_runtime_params_get()  to make it
> > > > > runtime explicit
> > > >
> > > > Agreed

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 0/3] Split logging functionality out of EAL
  2023-01-23 14:36  0%         ` Bruce Richardson
@ 2023-01-23 14:42  0%           ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-01-23 14:42 UTC (permalink / raw)
  To: Bruce Richardson, Dodji Seketeli; +Cc: dev, Thomas Monjalon

On Mon, Jan 23, 2023 at 3:37 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Mon, Jan 23, 2023 at 03:31:58PM +0100, David Marchand wrote:
> > On Mon, Jan 23, 2023 at 3:24 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Sun, Jan 22, 2023 at 03:56:12PM +0100, David Marchand wrote:
> > > > Hi Bruce,
> > > >
> > > > On Fri, Jan 20, 2023 at 7:22 PM Bruce Richardson
> > > > <bruce.richardson@intel.com> wrote:
> > > > >
> > > > > There is a general desire to reduce the size and scope of EAL. To this
> > > > > end, this patchset makes a (very) small step in that direction by taking
> > > > > the logging functionality out of EAL and putting it into its own library
> > > > > that can be built and maintained separately.
> > > > >
> > > > > As with the first RFC for this, the main obstacle is the "fnmatch"
> > > > > function which is needed by both EAL and the new log function when
> > > > > building on windows. While the function cannot stay in EAL - or we would
> > > > > have a circular dependency, moving it to a new library or just putting
> > > > > it in the log library have the disadvantages that it then "leaks" into
> > > > > the public namespace without an rte_prefix, which could cause issues.
> > > > > Since only a single function is involved, subsequent versions take a
> > > > > different approach to v1, and just moves the offending function to be a
> > > > > static function in a header file. This allows use by multiple libs
> > > > > without conflicting names or making it public.
> > > > >
> > > > > The other complication, as explained in v1 RFC was that of multiple
> > > > > implementations for different OS's. This is solved here in the same
> > > > > way as v1, by including the OS in the name and having meson pick the
> > > > > correct file for each build. Since only one file is involved, there
> > > > > seemed little need for replicating EAL's separate subdirectories
> > > > > per-OS.
> > > >
> > > > There is another complication.
> > > >
> > > > The ABI check is not handling properly the case where symbols are
> > > > moved to the new log library (even though the dependency to librte_log
> > > > is explicit in librte_eal elf).
> > > > For now, I don't have a good way to handle this.
> > > >
> > > > A workaround to pass the check is to suppress those symbols wrt the eal dump:
> > > > [suppress_function]
> > > >         symbol_name_regexp = rte_log
> > > > [suppress_function]
> > > >         symbol_name = rte_openlog_stream
> > > > [suppress_function]
> > > >         symbol_name = rte_vlog
> > > >
> > > > But this is not a good solution because we would be losing checks on
> > > > them for the rest of the v23 ABI life.
> > > >
> > > Right, I got error messages from the CI job for this too, but I also have
> > > no idea how to work around this. Perhaps we only get to move content
> > > between libraries when we do an ABI bump? Seems a bit restrictive, though.
> >
> > Moving symbols as you did does not seem an ABI breakage.
> > An application that links to eal would see the dt_needed entry for the
> > new log library, load it accordingly and gets the right symbols.
> >
> Yes, I agree. However, I also agree with you that it is risky to lose
> symbol checking for the moved symbols if we need to remove them from
> analysis. That said, maybe others have some ideas as to how to work around
> this, or perhaps we just take the risk of disabling checking.

I opened a bz for libabigail.
https://sourceware.org/bugzilla/show_bug.cgi?id=30034

If it is handled fast enough, we may have a solution by the time 23.03
is released and we will remove this workaround for 23.07 development
(no pressure Dodji :-p).


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 0/3] Split logging functionality out of EAL
  2023-01-23 14:31  3%       ` David Marchand
@ 2023-01-23 14:36  0%         ` Bruce Richardson
  2023-01-23 14:42  0%           ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-01-23 14:36 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Thomas Monjalon, Dodji Seketeli

On Mon, Jan 23, 2023 at 03:31:58PM +0100, David Marchand wrote:
> On Mon, Jan 23, 2023 at 3:24 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Sun, Jan 22, 2023 at 03:56:12PM +0100, David Marchand wrote:
> > > Hi Bruce,
> > >
> > > On Fri, Jan 20, 2023 at 7:22 PM Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > >
> > > > There is a general desire to reduce the size and scope of EAL. To this
> > > > end, this patchset makes a (very) small step in that direction by taking
> > > > the logging functionality out of EAL and putting it into its own library
> > > > that can be built and maintained separately.
> > > >
> > > > As with the first RFC for this, the main obstacle is the "fnmatch"
> > > > function which is needed by both EAL and the new log function when
> > > > building on windows. While the function cannot stay in EAL - or we would
> > > > have a circular dependency, moving it to a new library or just putting
> > > > it in the log library have the disadvantages that it then "leaks" into
> > > > the public namespace without an rte_prefix, which could cause issues.
> > > > Since only a single function is involved, subsequent versions take a
> > > > different approach to v1, and just moves the offending function to be a
> > > > static function in a header file. This allows use by multiple libs
> > > > without conflicting names or making it public.
> > > >
> > > > The other complication, as explained in v1 RFC was that of multiple
> > > > implementations for different OS's. This is solved here in the same
> > > > way as v1, by including the OS in the name and having meson pick the
> > > > correct file for each build. Since only one file is involved, there
> > > > seemed little need for replicating EAL's separate subdirectories
> > > > per-OS.
> > >
> > > There is another complication.
> > >
> > > The ABI check is not handling properly the case where symbols are
> > > moved to the new log library (even though the dependency to librte_log
> > > is explicit in librte_eal elf).
> > > For now, I don't have a good way to handle this.
> > >
> > > A workaround to pass the check is to suppress those symbols wrt the eal dump:
> > > [suppress_function]
> > >         symbol_name_regexp = rte_log
> > > [suppress_function]
> > >         symbol_name = rte_openlog_stream
> > > [suppress_function]
> > >         symbol_name = rte_vlog
> > >
> > > But this is not a good solution because we would be losing checks on
> > > them for the rest of the v23 ABI life.
> > >
> > Right, I got error messages from the CI job for this too, but I also have
> > no idea how to work around this. Perhaps we only get to move content
> > between libraries when we do an ABI bump? Seems a bit restrictive, though.
> 
> Moving symbols as you did does not seem an ABI breakage.
> An application that links to eal would see the dt_needed entry for the
> new log library, load it accordingly and gets the right symbols.
>
Yes, I agree. However, I also agree with you that it is risky to lose
symbol checking for the moved symbols if we need to remove them from
analysis. That said, maybe others have some ideas as to how to work around
this, or perhaps we just take the risk of disabling checking.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 0/3] Split logging functionality out of EAL
  2023-01-23 14:24  3%     ` Bruce Richardson
@ 2023-01-23 14:31  3%       ` David Marchand
  2023-01-23 14:36  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2023-01-23 14:31 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Thomas Monjalon, Dodji Seketeli

On Mon, Jan 23, 2023 at 3:24 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sun, Jan 22, 2023 at 03:56:12PM +0100, David Marchand wrote:
> > Hi Bruce,
> >
> > On Fri, Jan 20, 2023 at 7:22 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > There is a general desire to reduce the size and scope of EAL. To this
> > > end, this patchset makes a (very) small step in that direction by taking
> > > the logging functionality out of EAL and putting it into its own library
> > > that can be built and maintained separately.
> > >
> > > As with the first RFC for this, the main obstacle is the "fnmatch"
> > > function which is needed by both EAL and the new log function when
> > > building on windows. While the function cannot stay in EAL - or we would
> > > have a circular dependency, moving it to a new library or just putting
> > > it in the log library have the disadvantages that it then "leaks" into
> > > the public namespace without an rte_prefix, which could cause issues.
> > > Since only a single function is involved, subsequent versions take a
> > > different approach to v1, and just moves the offending function to be a
> > > static function in a header file. This allows use by multiple libs
> > > without conflicting names or making it public.
> > >
> > > The other complication, as explained in v1 RFC was that of multiple
> > > implementations for different OS's. This is solved here in the same
> > > way as v1, by including the OS in the name and having meson pick the
> > > correct file for each build. Since only one file is involved, there
> > > seemed little need for replicating EAL's separate subdirectories
> > > per-OS.
> >
> > There is another complication.
> >
> > The ABI check is not handling properly the case where symbols are
> > moved to the new log library (even though the dependency to librte_log
> > is explicit in librte_eal elf).
> > For now, I don't have a good way to handle this.
> >
> > A workaround to pass the check is to suppress those symbols wrt the eal dump:
> > [suppress_function]
> >         symbol_name_regexp = rte_log
> > [suppress_function]
> >         symbol_name = rte_openlog_stream
> > [suppress_function]
> >         symbol_name = rte_vlog
> >
> > But this is not a good solution because we would be losing checks on
> > them for the rest of the v23 ABI life.
> >
> Right, I got error messages from the CI job for this too, but I also have
> no idea how to work around this. Perhaps we only get to move content
> between libraries when we do an ABI bump? Seems a bit restrictive, though.

Moving symbols as you did does not seem an ABI breakage.
An application that links to eal would see the dt_needed entry for the
new log library, load it accordingly and gets the right symbols.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v4 0/3] Split logging functionality out of EAL
  2023-01-22 14:56  4%   ` David Marchand
@ 2023-01-23 14:24  3%     ` Bruce Richardson
  2023-01-23 14:31  3%       ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-01-23 14:24 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Thomas Monjalon, Dodji Seketeli

On Sun, Jan 22, 2023 at 03:56:12PM +0100, David Marchand wrote:
> Hi Bruce,
> 
> On Fri, Jan 20, 2023 at 7:22 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > There is a general desire to reduce the size and scope of EAL. To this
> > end, this patchset makes a (very) small step in that direction by taking
> > the logging functionality out of EAL and putting it into its own library
> > that can be built and maintained separately.
> >
> > As with the first RFC for this, the main obstacle is the "fnmatch"
> > function which is needed by both EAL and the new log function when
> > building on windows. While the function cannot stay in EAL - or we would
> > have a circular dependency, moving it to a new library or just putting
> > it in the log library have the disadvantages that it then "leaks" into
> > the public namespace without an rte_prefix, which could cause issues.
> > Since only a single function is involved, subsequent versions take a
> > different approach to v1, and just moves the offending function to be a
> > static function in a header file. This allows use by multiple libs
> > without conflicting names or making it public.
> >
> > The other complication, as explained in v1 RFC was that of multiple
> > implementations for different OS's. This is solved here in the same
> > way as v1, by including the OS in the name and having meson pick the
> > correct file for each build. Since only one file is involved, there
> > seemed little need for replicating EAL's separate subdirectories
> > per-OS.
> 
> There is another complication.
> 
> The ABI check is not handling properly the case where symbols are
> moved to the new log library (even though the dependency to librte_log
> is explicit in librte_eal elf).
> For now, I don't have a good way to handle this.
> 
> A workaround to pass the check is to suppress those symbols wrt the eal dump:
> [suppress_function]
>         symbol_name_regexp = rte_log
> [suppress_function]
>         symbol_name = rte_openlog_stream
> [suppress_function]
>         symbol_name = rte_vlog
> 
> But this is not a good solution because we would be losing checks on
> them for the rest of the v23 ABI life.
> 
Right, I got error messages from the CI job for this too, but I also have
no idea how to work around this. Perhaps we only get to move content
between libraries when we do an ABI bump? Seems a bit restrictive, though.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
  2023-01-20 10:33  0%       ` Naga Harish K, S V
@ 2023-01-23  9:31  0%         ` Jerin Jacob
  2023-01-23 18:07  0%           ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-01-23  9:31 UTC (permalink / raw)
  To: Naga Harish K, S V; +Cc: jerinj, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Fri, Jan 20, 2023 at 4:03 PM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, January 20, 2023 3:02 PM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Fri, Jan 20, 2023 at 2:28 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Wednesday, January 18, 2023 3:52 PM
> > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > <jay.jayatheerthan@intel.com>
> > > > Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> > > >
> > > > On Sat, Jan 7, 2023 at 9:49 PM Naga Harish K S V
> > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > >
> > > > > The adapter configuration parameters defined in the ``struct
> > > > > rte_event_eth_rx_adapter_config_params`` can be configured and
> > > > > retrieved using ``rte_event_eth_rx_adapter_set_params`` and
> > > > > ``rte_event_eth_tx_adapter_get_params`` respectively.
> > > > >
> > > > > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > > > > ---
> > > > >
> > > > > +/**
> > > > > + * Adapter configuration parameters  */ struct
> > > > > +rte_event_eth_rx_adapter_runtime_params {
> > > > > +       uint32_t max_nb_rx;
> > > > > +       /**< The adapter can return early if it has processed at least
> > > > > +        * max_nb_rx mbufs. This isn't treated as a requirement; batching
> > may
> > > > > +        * cause the adapter to process more than max_nb_rx mbufs.
> > > >
> > > > This parameter as specific to SW only driver. So future something
> > > > added from HW only driver item then it won't work for SW driver. So
> > > > we need capability per adapter.
> > > >
> > > > So,  I would suggest following theme across _all_ adapters.
> > > >
> > > > 1) Introduce RTE_EVENT_ETH_RX_ADAPTER_CAP_RUNTIME_XYZ and
> > associate
> > > > each parameter(the one we think, it is not common for all
> > > > adapters)
> > >
> > > The parameters that are exposed in the patch are all existing
> > > parameters and they are made runtime configurable for SW
> > > implementation. I think, there are no such parameters existing today
> > > for HW driver implementation. Hence it may be better to introduce these
> > flags when the HW driver Implementation requires runtime configurable
> > parameters.
> >
> > Since current values are not applicable to HW. So we any way need the
> > capability now to tell this is not applicable for HW.
> >
>
> Depending on the existing adapter capability flag "RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT",
> the current values can be applied to only SW implementation. In this way, there is no need for
> creating new capability flags.

OK. Makes sense. Please send next version with remaining suggestions.

>
> > >
> > > > 2) Add some reserved fields in
> > > > rte_event_eth_rx_adapter_runtime_params
> > > > so that we don't break ABI in future
> > >
> > > Agreed.
> > >
> > > > 3) Add rte_event_eth_rx_adapter_runtime_params_init() function just
> > > > make structure fill with default to avoid ABI break
> > > > 4) Add rte_event_eth_rx_adapter_runtime_params_info_get(). Lets
> > > > capability flags and other items can be return via this
> > >
> > > These two items(3,4) can be taken as and when item "1" above is
> > implemented.
> >
> > See above.
> >
> > >
> > > > 5) Change rte_event_eth_rx_adapter_set_params as
> > > > rte_event_eth_rx_adapter_runtime_set()  or
> > > > rte_event_eth_rx_adapter_runtime_params_set() to make it runtime
> > > > explicit
> > >
> > > Agreed
> > >
> > > > 6) Change rte_event_eth_rx_adapter_get_params as
> > > > rte_event_eth_rx_adapter_runtime_get() or
> > > > rte_event_eth_rx_adapter_runtime_params_get()  to make it runtime
> > > > explicit
> > >
> > > Agreed

^ permalink raw reply	[relevance 0%]

* Minutes of Technical Board Meeting, 2022-11-30
@ 2023-01-23  9:03  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-01-23  9:03 UTC (permalink / raw)
  To: dev

1/ Traces in ethdev API

It has been decided to have traces everywhere.

In datapath or in grey area (not sure how fast it should be),
we must use compile-time enabled trace for performance reason.

2/ Tech writer hiring

Hiring failed so far.
Thomas to ask explanations for recent candidates.

3/ MIT license exception

No progress at this date.

4/ UNH SOW

The accomplishments of 2022 must be categorized to fit with 2022 SOW.

The 2023 plan must be voted in a sheet during the next 2 weeks,
and it could be validated during the next meeting.

5/ Release 22.11.1 with hotfix

There is an issue for comparing ABI with 22.11.0.
We need review and testing of the hotfix for making 22.11.1.
The new tag will be the reference for ABI checks.

^ permalink raw reply	[relevance 4%]

* Minutes of Technical Board Meeting, 2022-09-06
@ 2023-01-23  9:02  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-01-23  9:02 UTC (permalink / raw)
  To: dev; +Cc: Lincoln Lavoie

We had a in-person Technical Board meeting
at the end of the first day of the summit in Arcachon, France.
Some attendees joined virtually, many were in the room.

This is a very late recap of what was said,
mainly based on notes taken by Lincoln, thanks to him.

1/ Testing

Lincoln provided a summary of the Community Lab and the CI process.

Discussed which "rolling" distribution should be used for testing,
group seemed to lean toward support of Arch Linux,
and dropping support for Fedora Rawhide.

When a container upgrade (likely the rolling) starts causing errors,
the previous container must be kept for regular testing,
but we need to be notified of the failure caused by an upgrade.
A badge could be added to the top of the patchwork page,
to represent the status of the rolling distro,
tested periodically on the DPDK main branch.

Honnappa did an intro about the improvements of DTS.
DTS will move in the main DPDK repository.
A future goal is to require DTS update for new DPDK features submissions.
The scope of DTS is testing on real hardware, compared to unit testing.

2/ Security Process

Last week there were two CVEs fixed in all releases in parallel.

There is a specific mailing list, security-prerelease@dpdk.org,
very low volume, with mails coming a few days in advance of the security releases.

3/ Minimum Versions of Software

Meson
It could be interesting to benefit from latest Meson features.

Python
Minimum version is 3.5, which is no longer included in any stable distro.
DTS seems to be aiming for 3.8 as the minimum version,
which might break on older distros.

4/ ABI/API Change Allowances

We had a long discussion about API allowed to break,
and ABI allowed to break every year for LTS versions.
These breaking changes have an impact on users upgrading or not,
and how LTS branches are used.
We need to take into account any change which has an impact for the users.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v4 0/3] Split logging functionality out of EAL
  @ 2023-01-22 14:56  4%   ` David Marchand
  2023-01-23 14:24  3%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2023-01-22 14:56 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Thomas Monjalon, Dodji Seketeli

Hi Bruce,

On Fri, Jan 20, 2023 at 7:22 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> There is a general desire to reduce the size and scope of EAL. To this
> end, this patchset makes a (very) small step in that direction by taking
> the logging functionality out of EAL and putting it into its own library
> that can be built and maintained separately.
>
> As with the first RFC for this, the main obstacle is the "fnmatch"
> function which is needed by both EAL and the new log function when
> building on windows. While the function cannot stay in EAL - or we would
> have a circular dependency, moving it to a new library or just putting
> it in the log library have the disadvantages that it then "leaks" into
> the public namespace without an rte_prefix, which could cause issues.
> Since only a single function is involved, subsequent versions take a
> different approach to v1, and just moves the offending function to be a
> static function in a header file. This allows use by multiple libs
> without conflicting names or making it public.
>
> The other complication, as explained in v1 RFC was that of multiple
> implementations for different OS's. This is solved here in the same
> way as v1, by including the OS in the name and having meson pick the
> correct file for each build. Since only one file is involved, there
> seemed little need for replicating EAL's separate subdirectories
> per-OS.

There is another complication.

The ABI check is not handling properly the case where symbols are
moved to the new log library (even though the dependency to librte_log
is explicit in librte_eal elf).
For now, I don't have a good way to handle this.

A workaround to pass the check is to suppress those symbols wrt the eal dump:
[suppress_function]
        symbol_name_regexp = rte_log
[suppress_function]
        symbol_name = rte_openlog_stream
[suppress_function]
        symbol_name = rte_vlog

But this is not a good solution because we would be losing checks on
them for the rest of the v23 ABI life.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* RE: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
  @ 2023-01-20 10:33  0%       ` Naga Harish K, S V
  2023-01-23  9:31  0%         ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-01-20 10:33 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: jerinj, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, January 20, 2023 3:02 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Fri, Jan 20, 2023 at 2:28 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, January 18, 2023 3:52 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Sat, Jan 7, 2023 at 9:49 PM Naga Harish K S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > > The adapter configuration parameters defined in the ``struct
> > > > rte_event_eth_rx_adapter_config_params`` can be configured and
> > > > retrieved using ``rte_event_eth_rx_adapter_set_params`` and
> > > > ``rte_event_eth_tx_adapter_get_params`` respectively.
> > > >
> > > > Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
> > > > ---
> > > >
> > > > +/**
> > > > + * Adapter configuration parameters  */ struct
> > > > +rte_event_eth_rx_adapter_runtime_params {
> > > > +       uint32_t max_nb_rx;
> > > > +       /**< The adapter can return early if it has processed at least
> > > > +        * max_nb_rx mbufs. This isn't treated as a requirement; batching
> may
> > > > +        * cause the adapter to process more than max_nb_rx mbufs.
> > >
> > > This parameter as specific to SW only driver. So future something
> > > added from HW only driver item then it won't work for SW driver. So
> > > we need capability per adapter.
> > >
> > > So,  I would suggest following theme across _all_ adapters.
> > >
> > > 1) Introduce RTE_EVENT_ETH_RX_ADAPTER_CAP_RUNTIME_XYZ and
> associate
> > > each parameter(the one we think, it is not common for all
> > > adapters)
> >
> > The parameters that are exposed in the patch are all existing
> > parameters and they are made runtime configurable for SW
> > implementation. I think, there are no such parameters existing today
> > for HW driver implementation. Hence it may be better to introduce these
> flags when the HW driver Implementation requires runtime configurable
> parameters.
> 
> Since current values are not applicable to HW. So we any way need the
> capability now to tell this is not applicable for HW.
> 

Depending on the existing adapter capability flag "RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT",
the current values can be applied to only SW implementation. In this way, there is no need for
creating new capability flags.

> >
> > > 2) Add some reserved fields in
> > > rte_event_eth_rx_adapter_runtime_params
> > > so that we don't break ABI in future
> >
> > Agreed.
> >
> > > 3) Add rte_event_eth_rx_adapter_runtime_params_init() function just
> > > make structure fill with default to avoid ABI break
> > > 4) Add rte_event_eth_rx_adapter_runtime_params_info_get(). Lets
> > > capability flags and other items can be return via this
> >
> > These two items(3,4) can be taken as and when item "1" above is
> implemented.
> 
> See above.
> 
> >
> > > 5) Change rte_event_eth_rx_adapter_set_params as
> > > rte_event_eth_rx_adapter_runtime_set()  or
> > > rte_event_eth_rx_adapter_runtime_params_set() to make it runtime
> > > explicit
> >
> > Agreed
> >
> > > 6) Change rte_event_eth_rx_adapter_get_params as
> > > rte_event_eth_rx_adapter_runtime_get() or
> > > rte_event_eth_rx_adapter_runtime_params_get()  to make it runtime
> > > explicit
> >
> > Agreed

^ permalink raw reply	[relevance 0%]

Results 1601-1800 of ~18000  next (older) | prev (newer) | reverse | sort options + mbox downloads above

-- links below jump to the message on this page --
2021-09-13  8:45     [dpdk-dev] Questions about rte_eth_link_speed_to_str API Min Hu (Connor)
2021-09-16  2:56     ` [dpdk-dev] [RFC] ethdev: improve link speed to string Min Hu (Connor)
2021-09-16  6:22       ` Andrew Rybchenko
2021-09-16  8:16         ` Min Hu (Connor)
2021-09-16  8:21           ` Andrew Rybchenko
2021-09-17  0:43             ` Min Hu (Connor)
2023-01-19 11:41               ` Ferruh Yigit
2023-01-19 16:45                 ` Stephen Hemminger
2023-02-10 14:41  3%               ` Ferruh Yigit
2023-03-23 14:40  3%                 ` Ferruh Yigit
2022-04-20  8:16     [PATCH v1 0/5] Direct re-arming of buffers on receive side Feifei Wang
2023-01-04  7:30     ` [PATCH v3 0/3] " Feifei Wang
2023-01-04  7:30       ` [PATCH v3 1/3] ethdev: enable direct rearm with separate API Feifei Wang
2023-01-04  8:21         ` Morten Brørup
2023-01-04  8:51           ` 回复: " Feifei Wang
2023-01-04 10:11             ` Morten Brørup
2023-02-24  8:55  0%           ` 回复: " Feifei Wang
2022-08-03 13:28     [dpdk-dev] [RFC PATCH 1/1] mldev: introduce machine learning device library jerinj
2023-02-03  8:42     ` [dpdk-dev] [PATCH v1 01/12] " Thomas Monjalon
2023-02-03 17:33       ` Stephen Hemminger
2023-02-03 20:18  2%     ` Thomas Monjalon
2023-02-03 20:26  0%       ` Stephen Hemminger
2023-02-03 20:49  0%         ` Thomas Monjalon
2023-02-05 23:41  0%           ` Stephen Hemminger
2022-08-29 15:18     [RFC PATCH 0/3] Split logging out of EAL Bruce Richardson
2023-01-20 18:21     ` [PATCH v4 0/3] Split logging functionality " Bruce Richardson
2023-01-22 14:56  4%   ` David Marchand
2023-01-23 14:24  3%     ` Bruce Richardson
2023-01-23 14:31  3%       ` David Marchand
2023-01-23 14:36  0%         ` Bruce Richardson
2023-01-23 14:42  0%           ` David Marchand
2022-10-14 17:23     [RFC 1/2] testpmd: make f_quit flag volatile Stephen Hemminger
2023-01-30 20:09     ` [PATCH v10 0/2] testpmd: handle signals safely Stephen Hemminger
2023-01-30 20:09       ` [PATCH v10 1/2] cmdline: handle EOF in cmdline_poll Stephen Hemminger
2023-01-30 22:12  3%     ` Ferruh Yigit
2022-10-20  9:31     [PATCH V5] ethdev: fix one address occupies two indexes in MAC addrs Huisong Li
2023-02-01 13:15  3% ` [PATCH V7] ethdev: fix one address occupies two entries " Huisong Li
2023-02-02 12:36  3% ` [PATCH V8] " Huisong Li
2023-02-02 18:09  0%   ` Ferruh Yigit
2022-11-03 15:47     [PATCH 0/2] ABI check updates David Marchand
2023-03-23 17:15  9% ` [PATCH v2 " David Marchand
2022-11-17  5:09     [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
2022-11-17  5:09     ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
2023-02-20 13:50  3%   ` Jerin Jacob
2023-02-24  6:31  0%     ` Yan, Zhirun
2023-02-26 22:23  0%       ` Jerin Jacob
2023-03-02  8:38  0%         ` Yan, Zhirun
2023-03-02 13:58  0%           ` Jerin Jacob
2023-03-07  8:26  0%             ` Yan, Zhirun
2022-12-08  8:05     [PATCH 0/8] fix possible data truncation and conversion error Huisong Li
2022-12-19  7:06     ` [PATCH V8 0/8] telemetry: fix data truncation and conversion error and add hex integer API Huisong Li
2023-01-16 12:06       ` lihuisong (C)
2023-01-30 10:39  0%     ` lihuisong (C)
2022-12-13 18:27     [RFC PATCH 0/7] Standardize telemetry int types Bruce Richardson
2023-01-12 17:41     ` [PATCH v3 0/9] " Bruce Richardson
2023-01-12 17:41       ` [PATCH v3 9/9] telemetry: change public API to use 64-bit signed values Bruce Richardson
2023-02-05 22:55  0%     ` Thomas Monjalon
2022-12-21  8:42     [RFC 1/9] ethdev: add IPv6 routing extension header definition Rongwei Liu
2023-01-19  3:11     ` [PATCH v2 0/8] add IPv6 routing extension support Rongwei Liu
2023-01-19  3:11       ` [PATCH v2 1/8] ethdev: add IPv6 routing extension header definition Rongwei Liu
2023-01-20  9:20         ` Andrew Rybchenko
2023-01-30  3:46  0%       ` Rongwei Liu
2023-01-30  3:59           ` [PATCH v3 0/8] add IPv6 routing extension support Rongwei Liu
2023-01-30  3:59  3%         ` [PATCH v3 1/8] ethdev: add IPv6 routing extension header definition Rongwei Liu
2022-12-21 10:29     [RFC 0/5] add new port affinity item and affinity in Tx queue API Jiawei Wang
2023-01-18 11:37     ` [RFC 2/5] ethdev: introduce the affinity field " Thomas Monjalon
2023-01-18 14:44       ` Jiawei(Jonny) Wang
2023-01-18 16:31         ` Thomas Monjalon
2023-01-24 13:32  0%       ` Jiawei(Jonny) Wang
2023-01-01 13:47     [PATCH] compressdev: fix end of comp PMD list macro conflict Michael Baum
2023-01-30 19:30  3% ` [EXT] " Akhil Goyal
2023-01-31  8:23  0%   ` Akhil Goyal
2023-02-01 13:19  0%     ` Akhil Goyal
2023-02-01 13:29  0%       ` Michael Baum
2023-02-01 14:02  0%         ` Akhil Goyal
2023-01-07 16:18     [PATCH 1/3] eventdev/eth_rx: add params set/get APIs Naga Harish K S V
2023-01-18 10:22     ` Jerin Jacob
2023-01-20  8:58       ` Naga Harish K, S V
2023-01-20  9:32         ` Jerin Jacob
2023-01-20 10:33  0%       ` Naga Harish K, S V
2023-01-23  9:31  0%         ` Jerin Jacob
2023-01-23 18:07  0%           ` Naga Harish K, S V
2023-01-23 18:04     ` [PATCH v2 " Naga Harish K S V
2023-01-24  4:29       ` Jerin Jacob
2023-01-24 13:07         ` Naga Harish K, S V
2023-01-25  4:12           ` Jerin Jacob
2023-01-25  9:52             ` Naga Harish K, S V
2023-01-25 10:38  3%           ` Jerin Jacob
2023-01-25 16:32  0%             ` Naga Harish K, S V
2023-01-28 10:53  0%               ` Jerin Jacob
2023-01-28 17:21  3%                 ` Stephen Hemminger
2023-01-30  9:56  0%                 ` Naga Harish K, S V
2023-01-30 14:43  0%                   ` Jerin Jacob
2023-02-02 16:12  0%                     ` Naga Harish K, S V
2023-02-03  9:44  0%                       ` Jerin Jacob
2023-02-06  6:21  0%                         ` Naga Harish K, S V
2023-02-06 16:38  0%                           ` Jerin Jacob
2023-02-09 17:00  0%                             ` Naga Harish K, S V
2023-01-12 10:01     [PATCH v3 1/3] pcapng: comment option support for epb Amit Prakash Shukla
2023-01-24 11:21  4% ` [PATCH v4 " Amit Prakash Shukla
2023-01-24 11:21  2%   ` [PATCH v4 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-01-31  8:06  0%     ` Jerin Jacob
2023-02-03  8:15  0%       ` [EXT] " Amit Prakash Shukla
2023-02-03  8:19  4%   ` [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
2023-02-03  8:19  2%     ` [PATCH v5 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-02-09  9:56  4%     ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
2023-02-09  9:56  2%       ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-02-09 10:03  0%       ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
2023-02-09 10:24  4%       ` [PATCH v7 1/3] " Amit Prakash Shukla
2023-02-09 10:24  2%         ` [PATCH v7 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-01-12 11:21     [PATCH v5 0/6] add trace points in ethdev library Ankur Dwivedi
2023-01-20  8:40     ` [PATCH v6 " Ankur Dwivedi
2023-01-20  8:40       ` [PATCH v6 1/6] eal: trace: add trace point emit for blob Ankur Dwivedi
2023-01-23 17:27         ` Ferruh Yigit
2023-01-25 15:02           ` [EXT] " Ankur Dwivedi
2023-01-25 16:09  2%         ` Ferruh Yigit
2023-01-30 13:35  0%           ` Ankur Dwivedi
2023-01-12 11:35     [RFC PATCH 0/1] Specify C-standard requirement for DPDK builds Bruce Richardson
2023-02-03 14:09     ` Ben Magistro
2023-02-03 15:09       ` Bruce Richardson
2023-02-03 16:45         ` Ben Magistro
2023-02-03 18:00           ` Bruce Richardson
2023-02-10 14:52             ` Ben Magistro
2023-02-10 23:39  4%           ` Tyler Retzlaff
2023-01-12 21:26     [PATCH] eal: abstract compiler atomics Tyler Retzlaff
2023-01-12 21:26     ` [PATCH] eal: introduce atomics abstraction Tyler Retzlaff
2023-01-31 22:42       ` Thomas Monjalon
2023-02-01  1:07         ` Honnappa Nagarahalli
2023-02-01 21:41  3%       ` Tyler Retzlaff
2023-02-02  8:43  4%         ` Morten Brørup
2023-02-02 19:00  4%           ` Tyler Retzlaff
2023-02-02 20:44  0%             ` Morten Brørup
2023-02-03 13:56  0%               ` Bruce Richardson
2023-02-03 12:19  0%             ` Bruce Richardson
2023-02-03 20:49  0%               ` Tyler Retzlaff
2023-02-07 15:16  0%                 ` Morten Brørup
2023-02-07 21:58  0%                   ` Tyler Retzlaff
2023-02-07 23:34  0%         ` Honnappa Nagarahalli
2023-02-08  1:20  0%           ` Tyler Retzlaff
2023-02-08  8:31  3%             ` Morten Brørup
2023-02-08 16:35  4%               ` Tyler Retzlaff
2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
2023-02-09  8:34  4%                   ` Morten Brørup
2023-02-09 17:30  4%                   ` Tyler Retzlaff
2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
2023-02-10 20:30  3%                       ` Tyler Retzlaff
2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
2023-02-13 15:28  0%                           ` Ben Magistro
2023-02-13 23:18  0%                           ` Tyler Retzlaff
2023-02-08 21:43     ` [PATCH v2] eal: abstract compiler atomics Tyler Retzlaff
2023-02-08 21:43       ` [PATCH v2] eal: introduce atomics abstraction Tyler Retzlaff
2023-02-09  8:05         ` Morten Brørup
2023-02-09 18:15  4%       ` Tyler Retzlaff
2023-02-09 19:19  0%         ` Morten Brørup
2023-02-09 22:04  0%           ` Tyler Retzlaff
2023-01-16 15:37     [PATCH 0/5] dma/ioat: fix issues with stopping and restarting device Bruce Richardson
2023-01-16 17:37     ` [PATCH v2 0/6] " Bruce Richardson
2023-01-16 17:37       ` [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
2023-02-14 16:04  0%     ` Kevin Laatz
2023-02-15  1:59  3%     ` fengchengwen
2023-02-15 11:57  3%       ` Bruce Richardson
2023-02-16  1:24  0%         ` fengchengwen
2023-02-16  9:24  0%           ` Bruce Richardson
2023-02-16 11:09     ` [PATCH v3 0/6] dma/ioat: fix issues with stopping and restarting device Bruce Richardson
2023-02-16 11:09  3%   ` [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
2023-02-16 11:42  0%     ` fengchengwen
2023-01-23  9:02  4% Minutes of Technical Board Meeting, 2022-09-06 Thomas Monjalon
2023-01-23  9:03  4% Minutes of Technical Board Meeting, 2022-11-30 Thomas Monjalon
2023-01-24 10:33  3% DPDK Release Status Meeting 2023-01-19 Mcnamara, John
2023-01-25 22:36  3% deprecation notice process / clarification Tyler Retzlaff
2023-01-27 12:47  0% ` Thomas Monjalon
2023-01-28 13:08     [PATCH 0/2] add new mhpsdp hw port in the flow item and Tx queue API Jiawei Wang
2023-01-28 13:08  2% ` [PATCH 2/2] ethdev: introduce the mhpsdp hwport field in " Jiawei Wang
     [not found]     <http://patches.dpdk.org/project/dpdk/cover/20221221102934.13822-1-jiaweiw@nvidia.com/>
2023-01-30 17:00     ` [PATCH v2 0/2] add new PHY affinity in the flow item and " Jiawei Wang
2023-01-30 17:00  2%   ` [PATCH v2 2/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
2023-01-31 17:26  3%     ` Thomas Monjalon
2023-02-01  9:45  0%       ` Jiawei(Jonny) Wang
2023-02-01  9:05  0%     ` Andrew Rybchenko
2023-02-01 15:50  0%       ` Jiawei(Jonny) Wang
2023-02-02  9:28  0%         ` Andrew Rybchenko
2023-02-02 14:43  0%           ` Thomas Monjalon
2023-01-31  3:02     [PATCH v3 1/8] ethdev: add IPv6 routing extension header definition Stephen Hemminger
2023-01-31  9:36     ` [PATCH v4 0/3] add IPv6 routing extension support Rongwei Liu
2023-01-31  9:36  3%   ` [PATCH v4 1/3] ethdev: add IPv6 routing extension header definition Rongwei Liu
2023-02-01  9:21  0%     ` Andrew Rybchenko
2023-02-01  9:27  0%       ` Rongwei Liu
2023-02-01  9:31  0%         ` Andrew Rybchenko
2023-02-01 11:35               ` [PATCH v5 0/3] add IPv6 routing extension support Rongwei Liu
2023-02-01 11:35  3%             ` [PATCH v5 1/3] ethdev: add IPv6 routing extension header definition Rongwei Liu
     [not found]     <20220825024425.10534-1-lihuisong@huawei.com>
2023-01-18 14:12     ` [PATCH V4 0/5] app/testpmd: support mulitple process attach and detach port Thomas Monjalon
2023-01-19 10:31       ` lihuisong (C)
2023-01-19 14:35         ` Thomas Monjalon
2023-01-28  1:39  0%       ` lihuisong (C)
2023-01-31  3:33  3% ` [PATCH V5 0/5] app/testpmd: support multiple " Huisong Li
2023-01-31  3:33  2%   ` [PATCH V5 2/5] ethdev: fix skip valid port in probing callback Huisong Li
2023-02-01 13:44     [PATCH v5 1/3] ethdev: add IPv6 routing extension header definition Thomas Monjalon
2023-02-02 10:00     ` [PATCH v6 0/3] add IPv6 routing extension support Rongwei Liu
2023-02-02 10:00  3%   ` [PATCH v6 1/3] ethdev: add IPv6 routing extension header definition Rongwei Liu
2023-02-02 19:23     Sign changes through function signatures Ben Magistro
2023-02-02 20:26     ` Tyler Retzlaff
2023-02-02 20:45  3%   ` Thomas Monjalon
2023-02-02 21:26  3%     ` Morten Brørup
2023-02-03 12:05  4%       ` Bruce Richardson
2023-02-03 22:12  0%         ` Tyler Retzlaff
2023-02-04  8:09  0%           ` Morten Brørup
2023-02-06 15:57  3%             ` Ben Magistro
     [not found]     <20221221102934.13822-1-jiaweiw@nvidia.com/>
2023-02-03  5:07     ` [PATCH v3 0/2] add new PHY affinity in the flow item and Tx queue API Jiawei Wang
2023-02-03  5:07  6%   ` [PATCH v3 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
2023-02-03 13:33       ` [PATCH v4 0/2] add new PHY affinity in the flow item and " Jiawei Wang
2023-02-03 13:33  6%     ` [PATCH v4 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
2023-02-06 15:29  0%       ` Jiawei(Jonny) Wang
2023-02-07  9:40  0%       ` Ori Kam
2023-02-09 19:44  0%       ` Ferruh Yigit
2023-02-10 14:06  0%         ` Jiawei(Jonny) Wang
2023-02-14  9:38  0%         ` Jiawei(Jonny) Wang
2023-02-14 10:01  0%           ` Ferruh Yigit
2023-02-03  8:08  6% [PATCH] doc: update NFP documentation with Corigine information Chaoyong He
2023-02-15 13:37  0% ` Ferruh Yigit
2023-02-15 17:58  0%   ` Niklas Söderlund
2023-02-20  8:41     ` [PATCH v2 0/3] update NFP documentation Chaoyong He
2023-02-20  8:41  8%   ` [PATCH v2 3/3] doc: add Corigine information to nfp documentation Chaoyong He
2023-02-06  7:05     [PATCH 0/3] cleanup the PMD Chaoyong He
2023-02-06  7:05  8% ` [PATCH 1/3] net/nfp: remove usage of print statements Chaoyong He
2023-02-07 20:41     [RFC 00/13] Replace static logtypes with static Stephen Hemminger
2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
2023-02-13 19:55  3%   ` [PATCH v4 18/19] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
2023-02-14  2:19  3%   ` [PATCH v5 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
2023-02-14 22:47       ` [PATCH v6 01/22] gso: don't log message on non TCP/UDP Stephen Hemminger
2023-02-15  7:26  3%     ` Hu, Jiayu
2023-02-15 17:12  0%       ` Stephen Hemminger
2023-02-14 22:47  3%   ` [PATCH v6 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
2023-02-15 17:23  3%   ` [PATCH v7 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-21 15:02  0%     ` David Marchand
2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
2023-02-21 19:02  2%   ` [PATCH v9 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
2023-02-22 16:08  2%   ` [PATCH v10 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-23  7:11  0%     ` Ruifeng Wang
2023-02-23  7:27  0%       ` Ruifeng Wang
2023-02-24  9:45  0%     ` Ruifeng Wang
2023-02-09  3:03     [PATCH] mem: fix displaying heap ID failed for heap info command Huisong Li
2023-02-22  7:49  4% ` [PATCH v2] " Huisong Li
2023-02-10  8:14     [PATCH v5 1/3] ethdev: skip congestion management configuration Rakesh Kudurumalla
2023-02-10  8:26     ` [PATCH v6 " Rakesh Kudurumalla
2023-02-11  0:35  3%   ` Ferruh Yigit
2023-02-11  5:16  0%     ` Jerin Jacob
2023-02-13  2:19     [PATCH v6 00/21] add support for cpfl PMD in DPDK Mingxia Liu
2023-02-16  0:29     ` [PATCH v7 01/21] net/cpfl: support device initialization Mingxia Liu
2023-02-27 13:46       ` Ferruh Yigit
2023-02-27 15:45         ` Thomas Monjalon
2023-02-27 23:38  3%       ` Ferruh Yigit
2023-02-13 11:31     [PATCH v10 0/4] add support for self monitoring Tomasz Duszynski
2023-02-16 17:54     ` [PATCH v11 " Tomasz Duszynski
2023-02-16 17:54       ` [PATCH v11 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-16 23:50         ` Konstantin Ananyev
2023-02-17  8:49           ` [EXT] " Tomasz Duszynski
2023-02-17 10:14             ` Konstantin Ananyev
2023-02-19 14:23               ` Tomasz Duszynski
2023-02-20 14:31                 ` Konstantin Ananyev
2023-02-20 16:59                   ` Tomasz Duszynski
2023-02-20 17:21                     ` Konstantin Ananyev
2023-02-20 20:42                       ` Tomasz Duszynski
2023-02-21  0:48  3%                     ` Konstantin Ananyev
2023-02-27  8:12  0%                       ` Tomasz Duszynski
2023-02-19 11:55     [PATCH] drivers: skip build of sub-libs not supporting IOVA mode Thomas Monjalon
2023-03-06 16:13     ` [PATCH v2 0/2] refactor diasbling IOVA as PA Thomas Monjalon
2023-03-06 16:13  2%   ` [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf Thomas Monjalon
2023-03-09  1:43  0%     ` fengchengwen
2023-03-09  7:29  0%       ` Thomas Monjalon
2023-03-09 11:23  0%         ` fengchengwen
2023-03-09 12:12  0%           ` Thomas Monjalon
2023-03-09 13:10  0%             ` Bruce Richardson
2023-03-13 15:51  0%               ` Thomas Monjalon
2023-02-21  3:10     [PATCH 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:10  3% ` [PATCH 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-21  3:29     ` [PATCH v2 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:29  3%   ` [PATCH v2 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-21  3:55       ` [PATCH v2 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:55  3%     ` [PATCH v2 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-22 21:43     [PATCH] vhost: fix madvise arguments alignment Mike Pattrick
2023-02-23  4:35     ` [PATCH v2] " Mike Pattrick
2023-02-23 16:12  3%   ` Maxime Coquelin
2023-02-23 16:57  0%     ` Mike Pattrick
2023-02-24 15:05  4%       ` Patrick Robb
2023-02-23 16:04     [RFC PATCH] drivers/net: fix RSS multi-queue mode check Ferruh Yigit
2023-02-27  1:34     ` lihuisong (C)
2023-02-27  9:57       ` Ferruh Yigit
2023-02-28  1:24         ` lihuisong (C)
2023-02-28  8:23  3%       ` Ferruh Yigit
2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
2023-02-28 16:01  0% ` Morten Brørup
2023-03-01 11:18  0%   ` Mattias Rönnblom
2023-03-01 13:31  3%     ` Morten Brørup
2023-03-01 15:50  3%       ` Mattias Rönnblom
2023-03-01 17:06  0%         ` Morten Brørup
2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
2023-03-09  8:56  4% [RFC 1/2] security: introduce out of place support for inline ingress Nithin Dabilpuram
2023-03-13  7:26     [PATCH] lib/hash: new feature adding existing key Abdullah Ömer Yamaç
2023-03-13  7:35     ` Abdullah Ömer Yamaç
2023-03-13 15:48  3%   ` Stephen Hemminger
2023-03-13  9:34     [PATCH] reorder: fix registration of dynamic field in mbuf Volodymyr Fialko
2023-03-13 10:19  3% ` David Marchand
2023-03-14 12:48     [PATCH 0/5] fix segment fault when parse args Chengwen Feng
2023-03-16 18:18     ` Ferruh Yigit
2023-03-17  2:43  3%   ` fengchengwen
2023-03-21 13:50  0%     ` Ferruh Yigit
2023-03-22  1:15  0%       ` fengchengwen
2023-03-22  8:53  0%         ` Ferruh Yigit
2023-03-22 13:49  0%           ` Thomas Monjalon
2023-03-23 11:58  3%             ` fengchengwen
2023-03-23 12:51  3%               ` Thomas Monjalon
2023-03-15 11:00     [PATCH 0/5] support setting and querying RSS algorithms Dongdong Liu
2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
2023-03-15 11:28  0%   ` Ivan Malov
2023-03-16 13:10  3%     ` Dongdong Liu
2023-03-16 14:31  0%       ` Ivan Malov
2023-03-15 13:43  3%   ` Thomas Monjalon
2023-03-16 13:16  3%     ` Dongdong Liu
2023-03-20 10:26     [PATCH 1/2] app/mldev: fix build with debug David Marchand
2023-03-20 10:26  5% ` [PATCH 2/2] ci: test compilation " David Marchand
2023-03-20 12:18     ` [PATCH v2 1/2] app/mldev: fix build " David Marchand
2023-03-20 12:18 19%   ` [PATCH v2 2/2] ci: test compilation with debug in GHA David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).