Optimizations are not features

DPDK patches and discussions
 help / color / mirror / Atom feed

* Optimizations are not features
@ 2022-06-04  9:09 Morten Brørup
  2022-06-04  9:33 ` Jerin Jacob
  0 siblings, 1 reply; 14+ messages in thread
From: Morten Brørup @ 2022-06-04  9:09 UTC (permalink / raw)
  To: dev; +Cc: techboard

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

I would like the DPDK community to change its view on compile time options. Here is why:

Application specific performance micro-optimizations like "fast mbuf free" and "mbuf direct re-arm" are being added to DPDK and presented as features.

They are not features, but optimizations, and I don't understand the need for them to be available at run-time!

Instead of adding a bunch of exotic exceptions to the fast path of the PMDs, they should be compile time options. This will improve performance by avoiding branches in the fast path, both for the applications using them, and for generic applications (where the exotic code is omitted).

Please note that I am only talking about the performance optimizations that are limited to application specific use cases. I think it makes sense to require that performance optimizing an application also requires recompiling the performance critical libraries used by it.

Allowing compile time options for application specific performance optimizations in DPDK would also open a path for other optimizations, which can only be achieved at compile time, such as "no fragmented packets", "no attached mbufs" and "single mbuf pool". And even more exotic optimizations, such as the "indexed mempool cache", which was rejected due to ABI violations - they could be marked as "risky and untested" or similar, but still be part of the DPDK main repository.

Med venlig hilsen / Kind regards,

-Morten Brørup

[-- Attachment #2: Type: text/html, Size: 3730 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04  9:09 Optimizations are not features Morten Brørup
@ 2022-06-04  9:33 ` Jerin Jacob
  2022-06-04 10:00   ` Andrew Rybchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Jerin Jacob @ 2022-06-04  9:33 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dpdk-dev, techboard

On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> I would like the DPDK community to change its view on compile time options. Here is why:
>
>
>
> Application specific performance micro-optimizations like “fast mbuf free” and “mbuf direct re-arm” are being added to DPDK and presented as features.
>
>
>
> They are not features, but optimizations, and I don’t understand the need for them to be available at run-time!
>
>
>
> Instead of adding a bunch of exotic exceptions to the fast path of the PMDs, they should be compile time options. This will improve performance by avoiding branches in the fast path, both for the applications using them, and for generic applications (where the exotic code is omitted).

Agree. I think, keeping the best of both worlds would be

-Enable the feature/optimization as runtime
-Have a compile-time option to disable the feature/optimization as an override.

>
>
>
> Please note that I am only talking about the performance optimizations that are limited to application specific use cases. I think it makes sense to require that performance optimizing an application also requires recompiling the performance critical libraries used by it.
>
>
>
> Allowing compile time options for application specific performance optimizations in DPDK would also open a path for other optimizations, which can only be achieved at compile time, such as “no fragmented packets”, “no attached mbufs” and “single mbuf pool”. And even more exotic optimizations, such as the “indexed mempool cache”, which was rejected due to ABI violations – they could be marked as “risky and untested” or similar, but still be part of the DPDK main repository.
>
>
>
>
>
> Med venlig hilsen / Kind regards,
>
> -Morten Brørup
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04  9:33 ` Jerin Jacob
@ 2022-06-04 10:00   ` Andrew Rybchenko
  2022-06-04 11:10     ` Jerin Jacob
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Rybchenko @ 2022-06-04 10:00 UTC (permalink / raw)
  To: Jerin Jacob, Morten Brørup; +Cc: dpdk-dev, techboard

On 6/4/22 12:33, Jerin Jacob wrote:
> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>>
>> I would like the DPDK community to change its view on compile time options. Here is why:
>>
>>
>>
>> Application specific performance micro-optimizations like “fast mbuf free” and “mbuf direct re-arm” are being added to DPDK and presented as features.
>>
>>
>>
>> They are not features, but optimizations, and I don’t understand the need for them to be available at run-time!
>>
>>
>>
>> Instead of adding a bunch of exotic exceptions to the fast path of the PMDs, they should be compile time options. This will improve performance by avoiding branches in the fast path, both for the applications using them, and for generic applications (where the exotic code is omitted).
> 
> Agree. I think, keeping the best of both worlds would be
> 
> -Enable the feature/optimization as runtime
> -Have a compile-time option to disable the feature/optimization as an override.

It is hard to find the right balance, but in general compile
time options are a nightmare for maintenance. Number of
required builds will grow as an exponent. Of course, we can
limit number of checked combinations, but it will result in
flow of patches to fix build in other cases.
Also compile time options tend to make code less readable
which makes all aspects of the development harder.

Yes, compile time is nice for micro optimizations, but
I have great concerns that it is a right way to go.

>> Please note that I am only talking about the performance optimizations that are limited to application specific use cases. I think it makes sense to require that performance optimizing an application also requires recompiling the performance critical libraries used by it.
>>
>>
>>
>> Allowing compile time options for application specific performance optimizations in DPDK would also open a path for other optimizations, which can only be achieved at compile time, such as “no fragmented packets”, “no attached mbufs” and “single mbuf pool”. And even more exotic optimizations, such as the “indexed mempool cache”, which was rejected due to ABI violations – they could be marked as “risky and untested” or similar, but still be part of the DPDK main repository.
>>
>>
>>
>>
>>
>> Med venlig hilsen / Kind regards,
>>
>> -Morten Brørup
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04 10:00   ` Andrew Rybchenko
@ 2022-06-04 11:10     ` Jerin Jacob
  2022-06-04 12:19       ` Morten Brørup
  0 siblings, 1 reply; 14+ messages in thread
From: Jerin Jacob @ 2022-06-04 11:10 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: Morten Brørup, dpdk-dev, techboard

On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
<andrew.rybchenko@oktetlabs.ru> wrote:
>
> On 6/4/22 12:33, Jerin Jacob wrote:
> > On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup <mb@smartsharesystems.com> wrote:
> >>
> >> I would like the DPDK community to change its view on compile time options. Here is why:
> >>
> >>
> >>
> >> Application specific performance micro-optimizations like “fast mbuf free” and “mbuf direct re-arm” are being added to DPDK and presented as features.
> >>
> >>
> >>
> >> They are not features, but optimizations, and I don’t understand the need for them to be available at run-time!
> >>
> >>
> >>
> >> Instead of adding a bunch of exotic exceptions to the fast path of the PMDs, they should be compile time options. This will improve performance by avoiding branches in the fast path, both for the applications using them, and for generic applications (where the exotic code is omitted).
> >
> > Agree. I think, keeping the best of both worlds would be
> >
> > -Enable the feature/optimization as runtime
> > -Have a compile-time option to disable the feature/optimization as an override.
>
> It is hard to find the right balance, but in general compile
> time options are a nightmare for maintenance. Number of
> required builds will grow as an exponent. Of course, we can
> limit number of checked combinations, but it will result in
> flow of patches to fix build in other cases.

The build breakage can be fixed if we use (2) vs (1)

1)
#ifdef ...
My feature
#endif

2)
static __rte_always_inline int
rte_has_xyz_feature(void)
{
#ifdef RTE_LIBRTE_XYZ_FEATURE
        return RTE_LIBRTE_XYZ_FEATURE;
#else
        return 0;
#endif
}

if(rte_has_xyz_feature())) {
My feature code

}



> Also compile time options tend to make code less readable
> which makes all aspects of the development harder.
>
> Yes, compile time is nice for micro optimizations, but
> I have great concerns that it is a right way to go.
>
> >> Please note that I am only talking about the performance optimizations that are limited to application specific use cases. I think it makes sense to require that performance optimizing an application also requires recompiling the performance critical libraries used by it.
> >>
> >>
> >>
> >> Allowing compile time options for application specific performance optimizations in DPDK would also open a path for other optimizations, which can only be achieved at compile time, such as “no fragmented packets”, “no attached mbufs” and “single mbuf pool”. And even more exotic optimizations, such as the “indexed mempool cache”, which was rejected due to ABI violations – they could be marked as “risky and untested” or similar, but still be part of the DPDK main repository.
> >>
> >>
> >>
> >>
> >>
> >> Med venlig hilsen / Kind regards,
> >>
> >> -Morten Brørup
> >>
> >>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Optimizations are not features
  2022-06-04 11:10     ` Jerin Jacob
@ 2022-06-04 12:19       ` Morten Brørup
  2022-06-04 12:51         ` Andrew Rybchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Morten Brørup @ 2022-06-04 12:19 UTC (permalink / raw)
  To: Jerin Jacob, Andrew Rybchenko; +Cc: dpdk-dev, techboard

> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> Sent: Saturday, 4 June 2022 13.10
> 
> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru> wrote:
> >
> > On 6/4/22 12:33, Jerin Jacob wrote:
> > > On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
> <mb@smartsharesystems.com> wrote:
> > >>
> > >> I would like the DPDK community to change its view on compile time
> options. Here is why:
> > >>
> > >>
> > >>
> > >> Application specific performance micro-optimizations like “fast
> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
> presented as features.
> > >>
> > >>
> > >>
> > >> They are not features, but optimizations, and I don’t understand
> the need for them to be available at run-time!
> > >>
> > >>
> > >>
> > >> Instead of adding a bunch of exotic exceptions to the fast path of
> the PMDs, they should be compile time options. This will improve
> performance by avoiding branches in the fast path, both for the
> applications using them, and for generic applications (where the exotic
> code is omitted).
> > >
> > > Agree. I think, keeping the best of both worlds would be
> > >
> > > -Enable the feature/optimization as runtime
> > > -Have a compile-time option to disable the feature/optimization as
> an override.
> >
> > It is hard to find the right balance, but in general compile
> > time options are a nightmare for maintenance. Number of
> > required builds will grow as an exponent.

Test combinations are exponential for N features, regardless if N are runtime or compile time options.

> > Of course, we can
> > limit number of checked combinations, but it will result in
> > flow of patches to fix build in other cases.
> 
> The build breakage can be fixed if we use (2) vs (1)
> 
> 1)
> #ifdef ...
> My feature
> #endif
> 
> 2)
> static __rte_always_inline int
> rte_has_xyz_feature(void)
> {
> #ifdef RTE_LIBRTE_XYZ_FEATURE
>         return RTE_LIBRTE_XYZ_FEATURE;
> #else
>         return 0;
> #endif
> }
> 
> if(rte_has_xyz_feature())) {
> My feature code
> 
> }
> 

I'm not sure all the features can be covered by that, e.g. added fields in structures.

Also, I would consider such features "opt in" at compile time only. As such, they could be allowed to break the ABI/API.

> 
> 
> > Also compile time options tend to make code less readable
> > which makes all aspects of the development harder.
> >
> > Yes, compile time is nice for micro optimizations, but
> > I have great concerns that it is a right way to go.
> >
> > >> Please note that I am only talking about the performance
> optimizations that are limited to application specific use cases. I
> think it makes sense to require that performance optimizing an
> application also requires recompiling the performance critical
> libraries used by it.
> > >>
> > >>
> > >>
> > >> Allowing compile time options for application specific performance
> optimizations in DPDK would also open a path for other optimizations,
> which can only be achieved at compile time, such as “no fragmented
> packets”, “no attached mbufs” and “single mbuf pool”. And even more
> exotic optimizations, such as the “indexed mempool cache”, which was
> rejected due to ABI violations – they could be marked as “risky and
> untested” or similar, but still be part of the DPDK main repository.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Med venlig hilsen / Kind regards,
> > >>
> > >> -Morten Brørup
> > >>
> > >>
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04 12:19       ` Morten Brørup
@ 2022-06-04 12:51         ` Andrew Rybchenko
  2022-06-05  8:15           ` Morten Brørup
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Andrew Rybchenko @ 2022-06-04 12:51 UTC (permalink / raw)
  To: Morten Brørup, Jerin Jacob; +Cc: dpdk-dev, techboard

On 6/4/22 15:19, Morten Brørup wrote:
>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
>> Sent: Saturday, 4 June 2022 13.10
>>
>> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru> wrote:
>>>
>>> On 6/4/22 12:33, Jerin Jacob wrote:
>>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
>> <mb@smartsharesystems.com> wrote:
>>>>>
>>>>> I would like the DPDK community to change its view on compile time
>> options. Here is why:
>>>>>
>>>>>
>>>>>
>>>>> Application specific performance micro-optimizations like “fast
>> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
>> presented as features.
>>>>>
>>>>>
>>>>>
>>>>> They are not features, but optimizations, and I don’t understand
>> the need for them to be available at run-time!
>>>>>
>>>>>
>>>>>
>>>>> Instead of adding a bunch of exotic exceptions to the fast path of
>> the PMDs, they should be compile time options. This will improve
>> performance by avoiding branches in the fast path, both for the
>> applications using them, and for generic applications (where the exotic
>> code is omitted).
>>>>
>>>> Agree. I think, keeping the best of both worlds would be
>>>>
>>>> -Enable the feature/optimization as runtime
>>>> -Have a compile-time option to disable the feature/optimization as
>> an override.
>>>
>>> It is hard to find the right balance, but in general compile
>>> time options are a nightmare for maintenance. Number of
>>> required builds will grow as an exponent.
> 
> Test combinations are exponential for N features, regardless if N are runtime or compile time options.

But since I'm talking about build checks I don't care about exponential
grows in run time. Yes, testing should care, but it is a separate story.

> 
>>> Of course, we can
>>> limit number of checked combinations, but it will result in
>>> flow of patches to fix build in other cases.
>>
>> The build breakage can be fixed if we use (2) vs (1)
>>
>> 1)
>> #ifdef ...
>> My feature
>> #endif
>>
>> 2)
>> static __rte_always_inline int
>> rte_has_xyz_feature(void)
>> {
>> #ifdef RTE_LIBRTE_XYZ_FEATURE
>>          return RTE_LIBRTE_XYZ_FEATURE;
>> #else
>>          return 0;
>> #endif
>> }
>>
>> if(rte_has_xyz_feature())) {
>> My feature code
>>
>> }
>>

Jerin, thanks, very good example.

> I'm not sure all the features can be covered by that, e.g. added fields in structures.

+1

> 
> Also, I would consider such features "opt in" at compile time only. As such, they could be allowed to break the ABI/API.
> 
>>
>>
>>> Also compile time options tend to make code less readable
>>> which makes all aspects of the development harder.
>>>
>>> Yes, compile time is nice for micro optimizations, but
>>> I have great concerns that it is a right way to go.
>>>
>>>>> Please note that I am only talking about the performance
>> optimizations that are limited to application specific use cases. I
>> think it makes sense to require that performance optimizing an
>> application also requires recompiling the performance critical
>> libraries used by it.
>>>>>
>>>>>
>>>>>
>>>>> Allowing compile time options for application specific performance
>> optimizations in DPDK would also open a path for other optimizations,
>> which can only be achieved at compile time, such as “no fragmented
>> packets”, “no attached mbufs” and “single mbuf pool”. And even more
>> exotic optimizations, such as the “indexed mempool cache”, which was
>> rejected due to ABI violations – they could be marked as “risky and
>> untested” or similar, but still be part of the DPDK main repository.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Med venlig hilsen / Kind regards,
>>>>>
>>>>> -Morten Brørup
>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Optimizations are not features
  2022-06-04 12:51         ` Andrew Rybchenko
@ 2022-06-05  8:15           ` Morten Brørup
  2022-06-05 16:05           ` Stephen Hemminger
  2022-06-06  9:35           ` Konstantin Ananyev
  2 siblings, 0 replies; 14+ messages in thread
From: Morten Brørup @ 2022-06-05  8:15 UTC (permalink / raw)
  To: Andrew Rybchenko, Jerin Jacob; +Cc: dpdk-dev, techboard

> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Saturday, 4 June 2022 14.52
> 
> On 6/4/22 15:19, Morten Brørup wrote:
> >> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> >> Sent: Saturday, 4 June 2022 13.10
> >>
> >> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru> wrote:
> >>>
> >>> On 6/4/22 12:33, Jerin Jacob wrote:
> >>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
> >> <mb@smartsharesystems.com> wrote:
> >>>>>
> >>>>> I would like the DPDK community to change its view on compile
> time
> >> options. Here is why:
> >>>>>
> >>>>>
> >>>>>
> >>>>> Application specific performance micro-optimizations like “fast
> >> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
> >> presented as features.
> >>>>>
> >>>>>
> >>>>>
> >>>>> They are not features, but optimizations, and I don’t understand
> >> the need for them to be available at run-time!
> >>>>>
> >>>>>
> >>>>>
> >>>>> Instead of adding a bunch of exotic exceptions to the fast path
> of
> >> the PMDs, they should be compile time options. This will improve
> >> performance by avoiding branches in the fast path, both for the
> >> applications using them, and for generic applications (where the
> exotic
> >> code is omitted).
> >>>>
> >>>> Agree. I think, keeping the best of both worlds would be
> >>>>
> >>>> -Enable the feature/optimization as runtime
> >>>> -Have a compile-time option to disable the feature/optimization as
> >> an override.
> >>>
> >>> It is hard to find the right balance, but in general compile
> >>> time options are a nightmare for maintenance. Number of
> >>> required builds will grow as an exponent.
> >
> > Test combinations are exponential for N features, regardless if N are
> runtime or compile time options.
> 
> But since I'm talking about build checks I don't care about exponential
> grows in run time. Yes, testing should care, but it is a separate
> story.

Build checks is just one of many test methods. A very low cost and efficient test method, though.

I acknowledge that build checks will be more complicated with additional compile time options. And build checks are important for code quality, so we should find a solution for that challenge.

The primary scope of my suggestion is application specific performance optimization features only, so let's call them "exotic features". Also, they should be "opt in". Keep in mind: They are not really features, since they don't add anything new, they are only performance optimizations beneficial for specific application use cases.

With this in mind, we could continue ordinary testing with these options disabled. I.e. no additional testing.

Regardless how exotic the features may be, we certainly want some testing of them, so we could do some things to avoid exponential testing:

1. With each ordinary build check (with all options disabled), another build check could be run with a random set of options enabled. Or additional tests could be run on a weekly basis with random combinations of options.

2. Any patch related to one or more of these exotic features could include some magic keywords to indicate which combinations of options must be enabled during build checking and testing.

> 
> >
> >>> Of course, we can
> >>> limit number of checked combinations, but it will result in
> >>> flow of patches to fix build in other cases.
> >>
> >> The build breakage can be fixed if we use (2) vs (1)
> >>
> >> 1)
> >> #ifdef ...
> >> My feature
> >> #endif
> >>
> >> 2)
> >> static __rte_always_inline int
> >> rte_has_xyz_feature(void)
> >> {
> >> #ifdef RTE_LIBRTE_XYZ_FEATURE
> >>          return RTE_LIBRTE_XYZ_FEATURE;
> >> #else
> >>          return 0;
> >> #endif
> >> }
> >>
> >> if(rte_has_xyz_feature())) {
> >> My feature code
> >>
> >> }
> >>
> 
> Jerin, thanks, very good example.
> 
> > I'm not sure all the features can be covered by that, e.g. added
> fields in structures.
> 
> +1
> 
> >
> > Also, I would consider such features "opt in" at compile time only.
> As such, they could be allowed to break the ABI/API.
> >
> >>
> >>
> >>> Also compile time options tend to make code less readable
> >>> which makes all aspects of the development harder.
> >>>
> >>> Yes, compile time is nice for micro optimizations, but
> >>> I have great concerns that it is a right way to go.
> >>>
> >>>>> Please note that I am only talking about the performance
> >> optimizations that are limited to application specific use cases. I
> >> think it makes sense to require that performance optimizing an
> >> application also requires recompiling the performance critical
> >> libraries used by it.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Allowing compile time options for application specific
> performance
> >> optimizations in DPDK would also open a path for other
> optimizations,
> >> which can only be achieved at compile time, such as “no fragmented
> >> packets”, “no attached mbufs” and “single mbuf pool”. And even more
> >> exotic optimizations, such as the “indexed mempool cache”, which was
> >> rejected due to ABI violations – they could be marked as “risky and
> >> untested” or similar, but still be part of the DPDK main repository.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Med venlig hilsen / Kind regards,
> >>>>>
> >>>>> -Morten Brørup
> >>>>>
> >>>>>
> >>>
> >
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04 12:51         ` Andrew Rybchenko
  2022-06-05  8:15           ` Morten Brørup
@ 2022-06-05 16:05           ` Stephen Hemminger
  2022-06-06  9:35           ` Konstantin Ananyev
  2 siblings, 0 replies; 14+ messages in thread
From: Stephen Hemminger @ 2022-06-05 16:05 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: Morten Brørup, Jerin Jacob, dpdk-dev, techboard

On Sat, 4 Jun 2022 15:51:58 +0300
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:

> On 6/4/22 15:19, Morten Brørup wrote:
> >> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> >> Sent: Saturday, 4 June 2022 13.10
> >>
> >> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru> wrote:  
> >>>
> >>> On 6/4/22 12:33, Jerin Jacob wrote:  
> >>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup  
> >> <mb@smartsharesystems.com> wrote:  
> >>>>>
> >>>>> I would like the DPDK community to change its view on compile time  
> >> options. Here is why:  
> >>>>>
> >>>>>
> >>>>>
> >>>>> Application specific performance micro-optimizations like “fast  
> >> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
> >> presented as features.  
> >>>>>
> >>>>>
> >>>>>
> >>>>> They are not features, but optimizations, and I don’t understand  
> >> the need for them to be available at run-time!  
> >>>>>
> >>>>>
> >>>>>
> >>>>> Instead of adding a bunch of exotic exceptions to the fast path of  
> >> the PMDs, they should be compile time options. This will improve
> >> performance by avoiding branches in the fast path, both for the
> >> applications using them, and for generic applications (where the exotic
> >> code is omitted).  
> >>>>
> >>>> Agree. I think, keeping the best of both worlds would be
> >>>>
> >>>> -Enable the feature/optimization as runtime
> >>>> -Have a compile-time option to disable the feature/optimization as  
> >> an override.  
> >>>
> >>> It is hard to find the right balance, but in general compile
> >>> time options are a nightmare for maintenance. Number of
> >>> required builds will grow as an exponent.  
> > 
> > Test combinations are exponential for N features, regardless if N are runtime or compile time options.  
> 
> But since I'm talking about build checks I don't care about exponential
> grows in run time. Yes, testing should care, but it is a separate story.
> 
> >   
> >>> Of course, we can
> >>> limit number of checked combinations, but it will result in
> >>> flow of patches to fix build in other cases.  
> >>
> >> The build breakage can be fixed if we use (2) vs (1)
> >>
> >> 1)
> >> #ifdef ...
> >> My feature
> >> #endif
> >>
> >> 2)
> >> static __rte_always_inline int
> >> rte_has_xyz_feature(void)
> >> {
> >> #ifdef RTE_LIBRTE_XYZ_FEATURE
> >>          return RTE_LIBRTE_XYZ_FEATURE;
> >> #else
> >>          return 0;
> >> #endif
> >> }
> >>
> >> if(rte_has_xyz_feature())) {
> >> My feature code
> >>
> >> }
> >>  
> 
> Jerin, thanks, very good example.
> 
> > I'm not sure all the features can be covered by that, e.g. added fields in structures.  
> 
> +1
> 
> > 
> > Also, I would consider such features "opt in" at compile time only. As such, they could be allowed to break the ABI/API.
> >   
> >>
> >>  
> >>> Also compile time options tend to make code less readable
> >>> which makes all aspects of the development harder.
> >>>
> >>> Yes, compile time is nice for micro optimizations, but
> >>> I have great concerns that it is a right way to go.
> >>>  
> >>>>> Please note that I am only talking about the performance  
> >> optimizations that are limited to application specific use cases. I
> >> think it makes sense to require that performance optimizing an
> >> application also requires recompiling the performance critical
> >> libraries used by it.  
> >>>>>
> >>>>>
> >>>>>
> >>>>> Allowing compile time options for application specific performance  
> >> optimizations in DPDK would also open a path for other optimizations,
> >> which can only be achieved at compile time, such as “no fragmented
> >> packets”, “no attached mbufs” and “single mbuf pool”. And even more
> >> exotic optimizations, such as the “indexed mempool cache”, which was
> >> rejected due to ABI violations – they could be marked as “risky and
> >> untested” or similar, but still be part of the DPDK main repository.  

There is a tradeoff that DPDK has had for several years.

1. For ease of use the DPDK should be available in Linux distributions in
pre-built binary format. In that case any changes in behavior need to be
done at runtime.

2. For performance and size, the DPDK should limit conditional branches
and not include dead code. This is what embedded appliance developers want.

3. For flexibilty, the DPDK should allow every option at the smallest granularity
(often per-packet or per-queue). This allows application to use feature if
available but not be limited to only hardware that supports it.

All of these do conflict. The big problem that I see is that when a feature
that changes the semantic of mbuf is used (no-attach, single pool etc).
It is opening other code to bugs. Therefore I am reluctant to use them;
in real life production 1% performance gain is totally offset by the cost
of .1% more bugs in code run by customers.

It would make my life easier if DPDK supported one set of semantics and
they worked everywhere.  This is what every OS does (Linux, FreeBSD, Windows).
That would mean either all drivers support the feature in all cases or
the feature is never introduced.  What would help this would be a set
of functions that help code do the right thing. Examples in Linux
are skb_linearize(), skb_maypull(), etc.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-04 12:51         ` Andrew Rybchenko
  2022-06-05  8:15           ` Morten Brørup
  2022-06-05 16:05           ` Stephen Hemminger
@ 2022-06-06  9:35           ` Konstantin Ananyev
  2022-06-29 20:44             ` Honnappa Nagarahalli
  2 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ananyev @ 2022-06-06  9:35 UTC (permalink / raw)
  To: Andrew Rybchenko, Morten Brørup, Jerin Jacob; +Cc: dpdk-dev, techboard

04/06/2022 13:51, Andrew Rybchenko пишет:
> On 6/4/22 15:19, Morten Brørup wrote:
>>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
>>> Sent: Saturday, 4 June 2022 13.10
>>>
>>> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
>>> <andrew.rybchenko@oktetlabs.ru> wrote:
>>>>
>>>> On 6/4/22 12:33, Jerin Jacob wrote:
>>>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
>>> <mb@smartsharesystems.com> wrote:
>>>>>>
>>>>>> I would like the DPDK community to change its view on compile time
>>> options. Here is why:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Application specific performance micro-optimizations like “fast
>>> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
>>> presented as features.
>>>>>>
>>>>>>
>>>>>>
>>>>>> They are not features, but optimizations, and I don’t understand
>>> the need for them to be available at run-time!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Instead of adding a bunch of exotic exceptions to the fast path of
>>> the PMDs, they should be compile time options. This will improve
>>> performance by avoiding branches in the fast path, both for the
>>> applications using them, and for generic applications (where the exotic
>>> code is omitted).
>>>>>
>>>>> Agree. I think, keeping the best of both worlds would be
>>>>>
>>>>> -Enable the feature/optimization as runtime
>>>>> -Have a compile-time option to disable the feature/optimization as
>>> an override.
>>>>
>>>> It is hard to find the right balance, but in general compile
>>>> time options are a nightmare for maintenance. Number of
>>>> required builds will grow as an exponent.
>>
>> Test combinations are exponential for N features, regardless if N are 
>> runtime or compile time options.
> 
> But since I'm talking about build checks I don't care about exponential
> grows in run time. Yes, testing should care, but it is a separate story.
> 
>>
>>>> Of course, we can
>>>> limit number of checked combinations, but it will result in
>>>> flow of patches to fix build in other cases.
>>>
>>> The build breakage can be fixed if we use (2) vs (1)
>>>
>>> 1)
>>> #ifdef ...
>>> My feature
>>> #endif
>>>
>>> 2)
>>> static __rte_always_inline int
>>> rte_has_xyz_feature(void)
>>> {
>>> #ifdef RTE_LIBRTE_XYZ_FEATURE
>>>          return RTE_LIBRTE_XYZ_FEATURE;
>>> #else
>>>          return 0;
>>> #endif
>>> }
>>>
>>> if(rte_has_xyz_feature())) {
>>> My feature code
>>>
>>> }
>>>
> 
> Jerin, thanks, very good example.
> 
>> I'm not sure all the features can be covered by that, e.g. added 
>> fields in structures.
> 
> +1
> 
>>
>> Also, I would consider such features "opt in" at compile time only. As 
>> such, they could be allowed to break the ABI/API.
>>
>>>
>>>
>>>> Also compile time options tend to make code less readable
>>>> which makes all aspects of the development harder.
>>>>
>>>> Yes, compile time is nice for micro optimizations, but
>>>> I have great concerns that it is a right way to go.
>>>>
>>>>>> Please note that I am only talking about the performance
>>> optimizations that are limited to application specific use cases. I
>>> think it makes sense to require that performance optimizing an
>>> application also requires recompiling the performance critical
>>> libraries used by it.
>>>>>>abandon some of existing functionality to create a 'short-cut'
>>>>>>
>>>>>>
>>>>>> Allowing compile time options for application specific performance
>>> optimizations in DPDK would also open a path for other optimizations,
>>> which can only be achieved at compile time, such as “no fragmented
>>> packets”, “no attached mbufs” and “single mbuf pool”. And even more
>>> exotic optimizations, such as the “indexed mempool cache”, which was
>>> rejected due to ABI violations – they could be marked as “risky and
>>> untested” or similar, but still be part of the DPDK main repository.
>>>>>>


Thanks Morten for bringing it up, it is an interesting topic.
Though I look at it from different angle.
All optimizations you mentioned above introduce new limitations:
MBUF_FAST_FREE - no indirect mbufs and multiple mempools,
mempool object indexes - mempool size is limited to 4GB,
direct rearm - drop ability to stop/reconfigure TX queue,
while RX queue is still running,
etc.
Note that all these limitations are not forced by HW.
All of them are pure SW limitations that developers forced in
(or tried to) to get few extra performance.
That's concerning tendency.

As more and more such 'optimization via limitation' will come in:
- DPDK feature list will become more and more fragmented.
- Would cause more and more confusion for the users.
- Unmet expectations - difference in performance between 'default'
   and 'optimized' version of DPDK will become bigger and bigger.
- As Andrew already mentioned, maintaining all these 'sub-flavours'
   of DPDK will become more and more difficult.

So, probably instead of making such changes easier,
we need somehow to persuade developers to think more about
optimizations that would be generic and transparent to the user.
I do realize that it is not always possible due to various reasons
(HW limitations, external dependencies, etc.)
but that's another story.

Let's take for example MBUF_FAST_FREE.
In fact, I am not sure that we need it as tx offload flag at all.
PMD TX-path has all necessary information to decide at run-time
can it do fast_free() for not:
At tx_burst() PMD can check are all mbufs satisfy these conditions
(same mempool, refcnt==1) and update some fields and/or counters
inside TXQ to reflect it.
Then, at tx_free() we can use this info to decide
between fast_free() and normal_free().
As at tx_burst() we read mbuf fields anyway, impact for this
extra step I guess would be minimal.
Yes, most likely, it wouldn't be as fast as with current
TX offload flag, or conditional compilation approach.
But it might be still significantly faster then normal_free(),
plus such approach will be generic and transparent to the user.

Konstantin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Optimizations are not features
  2022-06-06  9:35           ` Konstantin Ananyev
@ 2022-06-29 20:44             ` Honnappa Nagarahalli
  2022-06-30 15:39               ` Morten Brørup
  2022-07-03 19:38               ` Konstantin Ananyev
  0 siblings, 2 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2022-06-29 20:44 UTC (permalink / raw)
  To: Konstantin Ananyev, Andrew Rybchenko, Morten Brørup, Jerin Jacob
  Cc: dpdk-dev, techboard, nd, nd

<snip>

> 
> 04/06/2022 13:51, Andrew Rybchenko пишет:
> > On 6/4/22 15:19, Morten Brørup wrote:
> >>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> >>> Sent: Saturday, 4 June 2022 13.10
> >>>
> >>> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
> >>> <andrew.rybchenko@oktetlabs.ru> wrote:
> >>>>
> >>>> On 6/4/22 12:33, Jerin Jacob wrote:
> >>>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
> >>> <mb@smartsharesystems.com> wrote:
> >>>>>>
> >>>>>> I would like the DPDK community to change its view on compile
> >>>>>> time
> >>> options. Here is why:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Application specific performance micro-optimizations like “fast
> >>> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
> >>> presented as features.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> They are not features, but optimizations, and I don’t understand
> >>> the need for them to be available at run-time!
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Instead of adding a bunch of exotic exceptions to the fast path
> >>>>>> of
> >>> the PMDs, they should be compile time options. This will improve
> >>> performance by avoiding branches in the fast path, both for the
> >>> applications using them, and for generic applications (where the
> >>> exotic code is omitted).
> >>>>>
> >>>>> Agree. I think, keeping the best of both worlds would be
> >>>>>
> >>>>> -Enable the feature/optimization as runtime -Have a compile-time
> >>>>> option to disable the feature/optimization as
> >>> an override.
> >>>>
> >>>> It is hard to find the right balance, but in general compile time
> >>>> options are a nightmare for maintenance. Number of required builds
> >>>> will grow as an exponent.
> >>
> >> Test combinations are exponential for N features, regardless if N are
> >> runtime or compile time options.
> >
> > But since I'm talking about build checks I don't care about
> > exponential grows in run time. Yes, testing should care, but it is a separate
> story.
> >
> >>
> >>>> Of course, we can
> >>>> limit number of checked combinations, but it will result in flow of
> >>>> patches to fix build in other cases.
> >>>
> >>> The build breakage can be fixed if we use (2) vs (1)
> >>>
> >>> 1)
> >>> #ifdef ...
> >>> My feature
> >>> #endif
> >>>
> >>> 2)
> >>> static __rte_always_inline int
> >>> rte_has_xyz_feature(void)
> >>> {
> >>> #ifdef RTE_LIBRTE_XYZ_FEATURE
> >>>          return RTE_LIBRTE_XYZ_FEATURE; #else
> >>>          return 0;
> >>> #endif
> >>> }
> >>>
> >>> if(rte_has_xyz_feature())) {
> >>> My feature code
> >>>
> >>> }
> >>>
> >
> > Jerin, thanks, very good example.
> >
> >> I'm not sure all the features can be covered by that, e.g. added
> >> fields in structures.
> >
> > +1
> >
> >>
> >> Also, I would consider such features "opt in" at compile time only.
> >> As such, they could be allowed to break the ABI/API.
> >>
> >>>
> >>>
> >>>> Also compile time options tend to make code less readable which
> >>>> makes all aspects of the development harder.
> >>>>
> >>>> Yes, compile time is nice for micro optimizations, but I have great
> >>>> concerns that it is a right way to go.
> >>>>
> >>>>>> Please note that I am only talking about the performance
> >>> optimizations that are limited to application specific use cases. I
> >>> think it makes sense to require that performance optimizing an
> >>> application also requires recompiling the performance critical
> >>> libraries used by it.
> >>>>>>abandon some of existing functionality to create a 'short-cut'
> >>>>>>
> >>>>>>
> >>>>>> Allowing compile time options for application specific
> >>>>>> performance
> >>> optimizations in DPDK would also open a path for other
> >>> optimizations, which can only be achieved at compile time, such as
> >>> “no fragmented packets”, “no attached mbufs” and “single mbuf pool”.
> >>> And even more exotic optimizations, such as the “indexed mempool
> >>> cache”, which was rejected due to ABI violations – they could be
> >>> marked as “risky and untested” or similar, but still be part of the DPDK main
> repository.
> >>>>>>
> 
> 
> Thanks Morten for bringing it up, it is an interesting topic.
> Though I look at it from different angle.
> All optimizations you mentioned above introduce new limitations:
> MBUF_FAST_FREE - no indirect mbufs and multiple mempools, mempool object
> indexes - mempool size is limited to 4GB, direct rearm - drop ability to
> stop/reconfigure TX queue, while RX queue is still running, etc.
> Note that all these limitations are not forced by HW.
> All of them are pure SW limitations that developers forced in (or tried to) to get
> few extra performance.
> That's concerning tendency.
> 
> As more and more such 'optimization via limitation' will come in:
> - DPDK feature list will become more and more fragmented.
> - Would cause more and more confusion for the users.
> - Unmet expectations - difference in performance between 'default'
>    and 'optimized' version of DPDK will become bigger and bigger.
> - As Andrew already mentioned, maintaining all these 'sub-flavours'
>    of DPDK will become more and more difficult.
The point that we need to remember is, these features/optimizations are introduced after seeing performance issues in practical use cases.
DPDK is not being used in just one use case, it is being used in several use cases which have their own unique requirements. Is 4GB enough for packet buffers - yes it is enough in certain use cases. Are their NICs with single port - yes there are. HW is being created because use cases and business cases exist. It is obvious that as DPDK gets adopted on more platforms that differ largely, the features will increase and it will become complex. Complexity should not be used as a criteria to reject patches.

There is different perspective to what you are calling as 'limitations'. I can argue that multiple mempools, stop/reconfigure TX queue while RX queue is still running are exotic. Just because those are allowed currently (probably accidently) does not mean they are being used. Are there use cases that make use of these features?

The base/existing design for DPDK was done with one particular HW architecture in mind where there was an abundance of resources. Unfortunately, that HW architecture is fast evolving and DPDK is adopted in use cases where that kind of resources are not available. For ex: efficiency cores are being introduced by every CPU vendor now. Soon enough, we will see big-little architecture in networking as well. The existing PMD design introduces 512B of stores (256B for copying to stack variable and 256B to store lcore cache) and 256B load/store on RX side every 32 packets back to back. It doesn't make sense to have that kind of memcopy for little/efficiency cores just for the driver code.

> 
> So, probably instead of making such changes easier, we need somehow to
> persuade developers to think more about optimizations that would be generic
> and transparent to the user.
Or may be we need to think of creating alternate ways of programming.

> I do realize that it is not always possible due to various reasons (HW limitations,
> external dependencies, etc.) but that's another story.
> 
> Let's take for example MBUF_FAST_FREE.
> In fact, I am not sure that we need it as tx offload flag at all.
> PMD TX-path has all necessary information to decide at run-time can it do
> fast_free() for not:
> At tx_burst() PMD can check are all mbufs satisfy these conditions (same
> mempool, refcnt==1) and update some fields and/or counters inside TXQ to
> reflect it.
> Then, at tx_free() we can use this info to decide between fast_free() and
> normal_free().
> As at tx_burst() we read mbuf fields anyway, impact for this extra step I guess
> would be minimal.
> Yes, most likely, it wouldn't be as fast as with current TX offload flag, or
> conditional compilation approach.
> But it might be still significantly faster then normal_free(), plus such approach
> will be generic and transparent to the user.
IMO, this depends on the philosophy that we want to adopt. I would prefer to make control plane complex for performance gains on the data plane. The performance on the data plane has a multiplying effect due to the ratio of number of cores assigned for data plane vs control plane.

I am not against evaluating alternatives, but the alternative approaches need to have similar (not the same) performance.

> 
> Konstantin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Optimizations are not features
  2022-06-29 20:44             ` Honnappa Nagarahalli
@ 2022-06-30 15:39               ` Morten Brørup
  2022-07-03 19:38               ` Konstantin Ananyev
  1 sibling, 0 replies; 14+ messages in thread
From: Morten Brørup @ 2022-06-30 15:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Andrew Rybchenko, Jerin Jacob
  Cc: dpdk-dev, techboard, nd, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Wednesday, 29 June 2022 22.44
> 
> <snip>
> 
> >
> > 04/06/2022 13:51, Andrew Rybchenko пишет:
> > > On 6/4/22 15:19, Morten Brørup wrote:
> > >>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> > >>> Sent: Saturday, 4 June 2022 13.10
> > >>>
> > >>> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
> > >>> <andrew.rybchenko@oktetlabs.ru> wrote:
> > >>>>
> > >>>> On 6/4/22 12:33, Jerin Jacob wrote:
> > >>>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
> > >>> <mb@smartsharesystems.com> wrote:
> > >>>>>>
> > >>>>>> I would like the DPDK community to change its view on compile
> > >>>>>> time
> > >>> options. Here is why:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Application specific performance micro-optimizations like
> “fast
> > >>> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
> > >>> presented as features.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> They are not features, but optimizations, and I don’t
> understand
> > >>> the need for them to be available at run-time!
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Instead of adding a bunch of exotic exceptions to the fast
> path
> > >>>>>> of
> > >>> the PMDs, they should be compile time options. This will improve
> > >>> performance by avoiding branches in the fast path, both for the
> > >>> applications using them, and for generic applications (where the
> > >>> exotic code is omitted).
> > >>>>>
> > >>>>> Agree. I think, keeping the best of both worlds would be
> > >>>>>
> > >>>>> -Enable the feature/optimization as runtime -Have a compile-
> time
> > >>>>> option to disable the feature/optimization as
> > >>> an override.
> > >>>>
> > >>>> It is hard to find the right balance, but in general compile
> time
> > >>>> options are a nightmare for maintenance. Number of required
> builds
> > >>>> will grow as an exponent.
> > >>
> > >> Test combinations are exponential for N features, regardless if N
> are
> > >> runtime or compile time options.
> > >
> > > But since I'm talking about build checks I don't care about
> > > exponential grows in run time. Yes, testing should care, but it is
> a separate
> > story.
> > >
> > >>
> > >>>> Of course, we can
> > >>>> limit number of checked combinations, but it will result in flow
> of
> > >>>> patches to fix build in other cases.
> > >>>
> > >>> The build breakage can be fixed if we use (2) vs (1)
> > >>>
> > >>> 1)
> > >>> #ifdef ...
> > >>> My feature
> > >>> #endif
> > >>>
> > >>> 2)
> > >>> static __rte_always_inline int
> > >>> rte_has_xyz_feature(void)
> > >>> {
> > >>> #ifdef RTE_LIBRTE_XYZ_FEATURE
> > >>>          return RTE_LIBRTE_XYZ_FEATURE; #else
> > >>>          return 0;
> > >>> #endif
> > >>> }
> > >>>
> > >>> if(rte_has_xyz_feature())) {
> > >>> My feature code
> > >>>
> > >>> }
> > >>>
> > >
> > > Jerin, thanks, very good example.
> > >
> > >> I'm not sure all the features can be covered by that, e.g. added
> > >> fields in structures.
> > >
> > > +1
> > >
> > >>
> > >> Also, I would consider such features "opt in" at compile time
> only.
> > >> As such, they could be allowed to break the ABI/API.
> > >>
> > >>>
> > >>>
> > >>>> Also compile time options tend to make code less readable which
> > >>>> makes all aspects of the development harder.
> > >>>>
> > >>>> Yes, compile time is nice for micro optimizations, but I have
> great
> > >>>> concerns that it is a right way to go.
> > >>>>
> > >>>>>> Please note that I am only talking about the performance
> > >>> optimizations that are limited to application specific use cases.
> I
> > >>> think it makes sense to require that performance optimizing an
> > >>> application also requires recompiling the performance critical
> > >>> libraries used by it.
> > >>>>>>abandon some of existing functionality to create a 'short-cut'
> > >>>>>>
> > >>>>>>
> > >>>>>> Allowing compile time options for application specific
> > >>>>>> performance
> > >>> optimizations in DPDK would also open a path for other
> > >>> optimizations, which can only be achieved at compile time, such
> as
> > >>> “no fragmented packets”, “no attached mbufs” and “single mbuf
> pool”.
> > >>> And even more exotic optimizations, such as the “indexed mempool
> > >>> cache”, which was rejected due to ABI violations – they could be
> > >>> marked as “risky and untested” or similar, but still be part of
> the DPDK main
> > repository.
> > >>>>>>
> >
> >
> > Thanks Morten for bringing it up, it is an interesting topic.
> > Though I look at it from different angle.
> > All optimizations you mentioned above introduce new limitations:
> > MBUF_FAST_FREE - no indirect mbufs and multiple mempools, mempool
> object
> > indexes - mempool size is limited to 4GB, direct rearm - drop ability
> to
> > stop/reconfigure TX queue, while RX queue is still running, etc.
> > Note that all these limitations are not forced by HW.
> > All of them are pure SW limitations that developers forced in (or
> tried to) to get
> > few extra performance.
> > That's concerning tendency.
> >
> > As more and more such 'optimization via limitation' will come in:
> > - DPDK feature list will become more and more fragmented.
> > - Would cause more and more confusion for the users.
> > - Unmet expectations - difference in performance between 'default'
> >    and 'optimized' version of DPDK will become bigger and bigger.

I strongly disagree with this bullet!

We should not limit the performance to only what is possible with all features enabled.

An application developer should have the ability to disable performance-costly features not being used.

> > - As Andrew already mentioned, maintaining all these 'sub-flavours'
> >    of DPDK will become more and more difficult.
> The point that we need to remember is, these features/optimizations are
> introduced after seeing performance issues in practical use cases.
> DPDK is not being used in just one use case, it is being used in
> several use cases which have their own unique requirements. Is 4GB
> enough for packet buffers - yes it is enough in certain use cases. Are
> their NICs with single port - yes there are. HW is being created
> because use cases and business cases exist. It is obvious that as DPDK
> gets adopted on more platforms that differ largely, the features will
> increase and it will become complex. Complexity should not be used as a
> criteria to reject patches.
> 
> There is different perspective to what you are calling as
> 'limitations'. I can argue that multiple mempools, stop/reconfigure TX
> queue while RX queue is still running are exotic. Just because those
> are allowed currently (probably accidently) does not mean they are
> being used. Are there use cases that make use of these features?
> 
> The base/existing design for DPDK was done with one particular HW
> architecture in mind where there was an abundance of resources.
> Unfortunately, that HW architecture is fast evolving and DPDK is
> adopted in use cases where that kind of resources are not available.
> For ex: efficiency cores are being introduced by every CPU vendor now.
> Soon enough, we will see big-little architecture in networking as well.
> The existing PMD design introduces 512B of stores (256B for copying to
> stack variable and 256B to store lcore cache) and 256B load/store on RX
> side every 32 packets back to back. It doesn't make sense to have that
> kind of memcopy for little/efficiency cores just for the driver code.
> 
> >
> > So, probably instead of making such changes easier, we need somehow
> to
> > persuade developers to think more about optimizations that would be
> generic
> > and transparent to the user.
> Or may be we need to think of creating alternate ways of programming.

Exactly what I was hoping to achieve with this discussion.

> 
> > I do realize that it is not always possible due to various reasons
> (HW limitations,
> > external dependencies, etc.) but that's another story.
> >
> > Let's take for example MBUF_FAST_FREE.
> > In fact, I am not sure that we need it as tx offload flag at all.
> > PMD TX-path has all necessary information to decide at run-time can
> it do
> > fast_free() for not:
> > At tx_burst() PMD can check are all mbufs satisfy these conditions
> (same
> > mempool, refcnt==1) and update some fields and/or counters inside TXQ
> to
> > reflect it.
> > Then, at tx_free() we can use this info to decide between fast_free()
> and
> > normal_free().
> > As at tx_burst() we read mbuf fields anyway, impact for this extra
> step I guess
> > would be minimal.
> > Yes, most likely, it wouldn't be as fast as with current TX offload
> flag, or
> > conditional compilation approach.
> > But it might be still significantly faster then normal_free(), plus
> such approach
> > will be generic and transparent to the user.
> IMO, this depends on the philosophy that we want to adopt. I would
> prefer to make control plane complex for performance gains on the data
> plane. The performance on the data plane has a multiplying effect due
> to the ratio of number of cores assigned for data plane vs control
> plane.

Yes. And if some performance-costing feature is not possible to move out from the data plane to the control plane, it should be compile time optional.

And please note that I don't buy the argument that "it will be caught by branch prediction". You are not allowed to fill up my branch predictor table with cruft!

> 
> I am not against evaluating alternatives, but the alternative
> approaches need to have similar (not the same) performance.
> 
> >
> > Konstantin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-06-29 20:44             ` Honnappa Nagarahalli
  2022-06-30 15:39               ` Morten Brørup
@ 2022-07-03 19:38               ` Konstantin Ananyev
  2022-07-04 16:33                 ` Stephen Hemminger
  1 sibling, 1 reply; 14+ messages in thread
From: Konstantin Ananyev @ 2022-07-03 19:38 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Andrew Rybchenko, Morten Brørup, Jerin Jacob
  Cc: dpdk-dev, techboard, nd

29/06/2022 21:44, Honnappa Nagarahalli пишет:
> <snip>
> 
>>
>> 04/06/2022 13:51, Andrew Rybchenko пишет:
>>> On 6/4/22 15:19, Morten Brørup wrote:
>>>>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
>>>>> Sent: Saturday, 4 June 2022 13.10
>>>>>
>>>>> On Sat, Jun 4, 2022 at 3:30 PM Andrew Rybchenko
>>>>> <andrew.rybchenko@oktetlabs.ru> wrote:
>>>>>>
>>>>>> On 6/4/22 12:33, Jerin Jacob wrote:
>>>>>>> On Sat, Jun 4, 2022 at 2:39 PM Morten Brørup
>>>>> <mb@smartsharesystems.com> wrote:
>>>>>>>>
>>>>>>>> I would like the DPDK community to change its view on compile
>>>>>>>> time
>>>>> options. Here is why:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Application specific performance micro-optimizations like “fast
>>>>> mbuf free” and “mbuf direct re-arm” are being added to DPDK and
>>>>> presented as features.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> They are not features, but optimizations, and I don’t understand
>>>>> the need for them to be available at run-time!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Instead of adding a bunch of exotic exceptions to the fast path
>>>>>>>> of
>>>>> the PMDs, they should be compile time options. This will improve
>>>>> performance by avoiding branches in the fast path, both for the
>>>>> applications using them, and for generic applications (where the
>>>>> exotic code is omitted).
>>>>>>>
>>>>>>> Agree. I think, keeping the best of both worlds would be
>>>>>>>
>>>>>>> -Enable the feature/optimization as runtime -Have a compile-time
>>>>>>> option to disable the feature/optimization as
>>>>> an override.
>>>>>>
>>>>>> It is hard to find the right balance, but in general compile time
>>>>>> options are a nightmare for maintenance. Number of required builds
>>>>>> will grow as an exponent.
>>>>
>>>> Test combinations are exponential for N features, regardless if N are
>>>> runtime or compile time options.
>>>
>>> But since I'm talking about build checks I don't care about
>>> exponential grows in run time. Yes, testing should care, but it is a separate
>> story.
>>>
>>>>
>>>>>> Of course, we can
>>>>>> limit number of checked combinations, but it will result in flow of
>>>>>> patches to fix build in other cases.
>>>>>
>>>>> The build breakage can be fixed if we use (2) vs (1)
>>>>>
>>>>> 1)
>>>>> #ifdef ...
>>>>> My feature
>>>>> #endif
>>>>>
>>>>> 2)
>>>>> static __rte_always_inline int
>>>>> rte_has_xyz_feature(void)
>>>>> {
>>>>> #ifdef RTE_LIBRTE_XYZ_FEATURE
>>>>>           return RTE_LIBRTE_XYZ_FEATURE; #else
>>>>>           return 0;
>>>>> #endif
>>>>> }
>>>>>
>>>>> if(rte_has_xyz_feature())) {
>>>>> My feature code
>>>>>
>>>>> }
>>>>>
>>>
>>> Jerin, thanks, very good example.
>>>
>>>> I'm not sure all the features can be covered by that, e.g. added
>>>> fields in structures.
>>>
>>> +1
>>>
>>>>
>>>> Also, I would consider such features "opt in" at compile time only.
>>>> As such, they could be allowed to break the ABI/API.
>>>>
>>>>>
>>>>>
>>>>>> Also compile time options tend to make code less readable which
>>>>>> makes all aspects of the development harder.
>>>>>>
>>>>>> Yes, compile time is nice for micro optimizations, but I have great
>>>>>> concerns that it is a right way to go.
>>>>>>
>>>>>>>> Please note that I am only talking about the performance
>>>>> optimizations that are limited to application specific use cases. I
>>>>> think it makes sense to require that performance optimizing an
>>>>> application also requires recompiling the performance critical
>>>>> libraries used by it.
>>>>>>>> abandon some of existing functionality to create a 'short-cut'
>>>>>>>>
>>>>>>>>
>>>>>>>> Allowing compile time options for application specific
>>>>>>>> performance
>>>>> optimizations in DPDK would also open a path for other
>>>>> optimizations, which can only be achieved at compile time, such as
>>>>> “no fragmented packets”, “no attached mbufs” and “single mbuf pool”.
>>>>> And even more exotic optimizations, such as the “indexed mempool
>>>>> cache”, which was rejected due to ABI violations – they could be
>>>>> marked as “risky and untested” or similar, but still be part of the DPDK main
>> repository.
>>>>>>>>
>>
>>
>> Thanks Morten for bringing it up, it is an interesting topic.
>> Though I look at it from different angle.
>> All optimizations you mentioned above introduce new limitations:
>> MBUF_FAST_FREE - no indirect mbufs and multiple mempools, mempool object
>> indexes - mempool size is limited to 4GB, direct rearm - drop ability to
>> stop/reconfigure TX queue, while RX queue is still running, etc.
>> Note that all these limitations are not forced by HW.
>> All of them are pure SW limitations that developers forced in (or tried to) to get
>> few extra performance.
>> That's concerning tendency.
>>
>> As more and more such 'optimization via limitation' will come in:
>> - DPDK feature list will become more and more fragmented.
>> - Would cause more and more confusion for the users.
>> - Unmet expectations - difference in performance between 'default'
>>     and 'optimized' version of DPDK will become bigger and bigger.
>> - As Andrew already mentioned, maintaining all these 'sub-flavours'
>>     of DPDK will become more and more difficult.
> The point that we need to remember is, these features/optimizations are introduced after seeing performance issues in practical use cases.

Sorry I didn't get it: what performance issues you are talking about?
If let say our mempool code is sub-optimal in some place for some 
architecture due to bad design or bad implementation - please point to 
it and let's try to fix it, instead of avoiding using mempool API
If you just saying that avoiding using mempool in some cases
could buy us few extra performance (a short-cut),
then yes it surely could.
Another question - is it really worth it?
Having all mbufs management covered by one SW abstraction
helps a lot in terms of project maintainability, further extensions,
introducing new common optimizations, etc.

> DPDK is not being used in just one use case, it is being used in several use cases which have their own unique requirements. Is 4GB enough for packet buffers - yes it is enough in certain use 
cases. Are their NICs with single port - yes there are.

Sure there are NICs with one port.
But also there are NICs with 2 ports, 4 ports, etc.
Should we maintain specific DPDK sub-versions for all these cases?
 From my perspective - no.
It would be overwhelming effort for DPDK community, plus
many customers use DPDK to build their own products that supposed
to work seamlessly across multiple use-cases/platforms.

  HW is being created because use cases and business cases exist. It is 
obvious that as DPDK gets adopted on more platforms that differ largely, 
the features will increase and it will become complex. Complexity should 
not be used as a criteria to reject patches.

Well, we do have plenty of HW specific optimizations inside DPDK
and we put a lot of effort that all this HW specific staff be
transparent to the user as much as possible.
I don't see why for SW specific optimizations it should be different.

> 
> There is different perspective to what you are calling as 'limitations'. 

By 'limitations' I mean situation when user has to cut off
existing functionality to enable these 'optimizations'.

I can argue that multiple mempools, stop/reconfigure TX queue while RX 
queue is still running are exotic. Just because those are allowed 
currently (probably accidently) does not mean they are being used. Are 
there use cases that make use of these features?

If DPDK examples/l3fwd doesn't use these features,
it doesn't mean they are useless :)
I believe both multiple mempools (indirect-mbufs) and ability to
start/stop queues separately are major DPDK features that are used
across many real-world deployments.


> 
> The base/existing design for DPDK was done with one particular HW architecture in mind where there was an abundance of resources. Unfortunately, that HW architecture is fast evolving and DPDK is adopted in use cases where that kind of resources are not available. For ex: efficiency cores are being introduced by every CPU vendor now. Soon enough, we will see big-little architecture in networking as well. The existing PMD design introduces 512B of stores (256B for copying to stack variable and 256B to store lcore cache) and 256B load/store on RX side every 32 packets back to back. It doesn't make sense to have that kind of memcopy for little/efficiency cores just for the driver code.

I don't object about specific use-case optimizations.
Specially if the use-case is a common one.
But I think such changes has to be transparent to the user as
much as possible and shouldn't cause further DPDK code fragmentation
(new CONFIG options, etc.).
I understand that it is not always possible, but for pure SW based
optimizations, I think it is a reasonable expectation.

>>
>> So, probably instead of making such changes easier, we need somehow to
>> persuade developers to think more about optimizations that would be generic
>> and transparent to the user.
> Or may be we need to think of creating alternate ways of programming.
> 
>> I do realize that it is not always possible due to various reasons (HW limitations,
>> external dependencies, etc.) but that's another story.
>>
>> Let's take for example MBUF_FAST_FREE.
>> In fact, I am not sure that we need it as tx offload flag at all.
>> PMD TX-path has all necessary information to decide at run-time can it do
>> fast_free() for not:
>> At tx_burst() PMD can check are all mbufs satisfy these conditions (same
>> mempool, refcnt==1) and update some fields and/or counters inside TXQ to
>> reflect it.
>> Then, at tx_free() we can use this info to decide between fast_free() and
>> normal_free().
>> As at tx_burst() we read mbuf fields anyway, impact for this extra step I guess
>> would be minimal.
>> Yes, most likely, it wouldn't be as fast as with current TX offload flag, or
>> conditional compilation approach.
>> But it might be still significantly faster then normal_free(), plus such approach
>> will be generic and transparent to the user.
> IMO, this depends on the philosophy that we want to adopt. I would prefer to make control plane complex for performance gains on the data plane. The performance on the data plane has a multiplying effect due to the ratio of number of cores assigned for data plane vs control plane.
> 
> I am not against evaluating alternatives, but the alternative approaches need to have similar (not the same) performance.
> 
>>
>> Konstantin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimizations are not features
  2022-07-03 19:38               ` Konstantin Ananyev
@ 2022-07-04 16:33                 ` Stephen Hemminger
  2022-07-04 22:06                   ` Morten Brørup
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2022-07-04 16:33 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Morten Brørup,
	Jerin Jacob, dpdk-dev, techboard, nd

On Sun, 3 Jul 2022 20:38:21 +0100
Konstantin Ananyev <konstantin.v.ananyev@yandex.ru> wrote:

> > 
> > The base/existing design for DPDK was done with one particular HW architecture in mind where there was an abundance of resources. Unfortunately, that HW architecture is fast evolving and DPDK is adopted in use cases where that kind of resources are not available. For ex: efficiency cores are being introduced by every CPU vendor now. Soon enough, we will see big-little architecture in networking as well. The existing PMD design introduces 512B of stores (256B for copying to stack variable and 256B to store lcore cache) and 256B load/store on RX side every 32 packets back to back. It doesn't make sense to have that kind of memcopy for little/efficiency cores just for the driver code.  
> 
> I don't object about specific use-case optimizations.
> Specially if the use-case is a common one.
> But I think such changes has to be transparent to the user as
> much as possible and shouldn't cause further DPDK code fragmentation
> (new CONFIG options, etc.).
> I understand that it is not always possible, but for pure SW based
> optimizations, I think it is a reasonable expectation.

Great discussion.

Also, if you look back at the mailing list history, you can see that lots of users just
use DPDK because it is "go fast" secret sauce and have not understanding of the internals.

My concern, is that if one untestable optimization goes in for one hardware platform then
users will enable it all the time thinking it makes any and all uses cases faster.
Try explaining to a Linux user that the real-time kernel is *not* faster than
the normal kernel...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Optimizations are not features
  2022-07-04 16:33                 ` Stephen Hemminger
@ 2022-07-04 22:06                   ` Morten Brørup
  0 siblings, 0 replies; 14+ messages in thread
From: Morten Brørup @ 2022-07-04 22:06 UTC (permalink / raw)
  To: Stephen Hemminger, Konstantin Ananyev
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Jerin Jacob, dpdk-dev,
	techboard, nd

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, 4 July 2022 18.33
> 
> On Sun, 3 Jul 2022 20:38:21 +0100
> Konstantin Ananyev <konstantin.v.ananyev@yandex.ru> wrote:
> 
> > >
> > > The base/existing design for DPDK was done with one particular HW
> architecture in mind where there was an abundance of resources.
> Unfortunately, that HW architecture is fast evolving and DPDK is
> adopted in use cases where that kind of resources are not available.
> For ex: efficiency cores are being introduced by every CPU vendor now.
> Soon enough, we will see big-little architecture in networking as well.
> The existing PMD design introduces 512B of stores (256B for copying to
> stack variable and 256B to store lcore cache) and 256B load/store on RX
> side every 32 packets back to back. It doesn't make sense to have that
> kind of memcopy for little/efficiency cores just for the driver code.
> >
> > I don't object about specific use-case optimizations.
> > Specially if the use-case is a common one.

Or exotic, but high-volume, use cases! Those usually get a lot of attention from sales and product management people. :-)

DPDK needs to support those in the mainline, or we will end up with forks like Qualcomm's QSDK fork of the Linux kernel. (The QSDK fork from Qualcomm, a leading Wi-Fi chip set vendor, bypasses a lot of the Linux kernel's IP stack to provide much higher throughput for one use specific case, which is a quite high volume use case: a Wi-Fi Access Point.)

> > But I think such changes has to be transparent to the user as
> > much as possible and shouldn't cause further DPDK code fragmentation
> > (new CONFIG options, etc.).
> > I understand that it is not always possible, but for pure SW based
> > optimizations, I think it is a reasonable expectation.
> 
> Great discussion.
> 
> Also, if you look back at the mailing list history, you can see that
> lots of users just
> use DPDK because it is "go fast" secret sauce and have not
> understanding of the internals.

Certainly, DPDK should still do that!

I just want DPDK to be able to go faster for experts.

Car analogy: If you buy a fast car, it will go fast. If you bring it to a tuning specialist, it will go faster. Similarly, DPDK should go "fast", but also accept that specialists can make it go "faster".

> 
> My concern, is that if one untestable optimization goes in for one
> hardware platform then
> users will enable it all the time thinking it makes any and all uses
> cases faster.
> Try explaining to a Linux user that the real-time kernel is *not*
> faster than
> the normal kernel...

Yes, because of the common misconception that faster equals to higher bandwidth. But the real-time kernel does provide lower latency (under certain conditions), which means faster to some of us. I'm sorry... working with latency as one of our KPIs, I just couldn't resist it! ;-)

Seriously, DPDK cannot be limited to cater to everyone on Stack Overflow!

Jokes aside...

When we started using DPDK at SmartShare Systems, DPDK was a highly optimized development kit for embedded network appliances, perfect for our SmartShare StraightShaper WAN optimization appliances and future roadmap. Over time, DPDK has morphed into a packet processing library for Ubuntu and Red Hat, with a lot of added features we don't use, and no ability to remove those added features. Those added features potentially degrade the fast path performance, and increase the risk of bugs at system level.

Some software optimizations have been proposed to DPDK, to support some specific high-volume use cases. "mbuf fast free" got accepted, but "direct re-arm" is getting a lot of push-back, and the most recent "IOVA VA only mode" is another new optimization suggestion being discussed.

In theory, it would be nice if all software optimizations could be supported at run-time, but it adds at least one branch to the fast path for every optimization, eventually slowing down the fast path significantly. And some of the optimizations just make so much better sense at compile time than at runtime, e.g. the "IOVA VA mode".

So, I think we should start thinking about such optimizations differently: If someone needs to optimize something for a specific use case, it can be done at compile time; there is no need to do it at runtime. Which is what I meant by the subject of my email: Don't offer optimizations as runtime features; they are use case specific, and should be chosen at compile time only.

Referring to the Linux kernel as the golden standard, it even has "make menuconfig"... a menu driven configuration interface for compile time configuration. Why must DPDK have every exotic option available at runtime, when the Linux kernel considers it perfectly acceptable to have some things configurable at compile time only?

With this discussion, I am only asking for software optimizations (which usually also imply some other limitations) to be compile time options, rather than compile time options. Any application can achieve exactly the same without those optimizations enabled, but it will be faster with the optimization enabled.

I would love to go back to the good old days, where DPDK had a lot of compile time options to disable cruft we're not using, but I know that game was lost a long time ago! So I'm trying to find some middle ground that keeps all features in the "DPDK library for distros", but also allows hard core developers to tune the performance for their individual use cases.

Offering software optimizations as compile time options only, should also reduce the amount of push-back for such software optimizations.

Reading all the feedback from the thread, it seems that the major concern is testing. And for some mysterious reason, compiling 2^N features causes more concern than run-time testing 2^N features. I get the sense that run-time testing the various feature combinations is not happening today. :-(

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-04 22:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-04  9:09 Optimizations are not features Morten Brørup
2022-06-04  9:33 ` Jerin Jacob
2022-06-04 10:00   ` Andrew Rybchenko
2022-06-04 11:10     ` Jerin Jacob
2022-06-04 12:19       ` Morten Brørup
2022-06-04 12:51         ` Andrew Rybchenko
2022-06-05  8:15           ` Morten Brørup
2022-06-05 16:05           ` Stephen Hemminger
2022-06-06  9:35           ` Konstantin Ananyev
2022-06-29 20:44             ` Honnappa Nagarahalli
2022-06-30 15:39               ` Morten Brørup
2022-07-03 19:38               ` Konstantin Ananyev
2022-07-04 16:33                 ` Stephen Hemminger
2022-07-04 22:06                   ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).