On 29/08/2022 11:53, Thomas Monjalon wrote:
29/08/2022 12:41, Bruce Richardson:
On Fri, Aug 26, 2022 at 05:46:48PM +0200, Morten Brørup wrote:
From: Kevin Laatz [mailto:kevin.laatz@intel.com]
Sent: Friday, 26 August 2022 17.27

On 26/08/2022 09:29, Morten Brørup wrote:
From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
Sent: Friday, 26 August 2022 10.16

On Fri, Aug 26, 2022 at 1:37 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
On Fri, Aug 26, 2022 at 12:35:16PM +0530, Jerin Jacob wrote:
On Thu, Aug 25, 2022 at 8:56 PM Kevin Laatz
<kevin.laatz@intel.com>
wrote:
From: Anatoly Burakov <anatoly.burakov@intel.com>

Currently, there is no way to measure lcore poll busyness in a
passive way,
without any modifications to the application. This patch adds a
new EAL API
that will be able to passively track core polling busyness.

The poll busyness is calculated by relying on the fact that most
DPDK API's
will poll for packets. Empty polls can be counted as "idle",
while
non-empty polls can be counted as busy. To measure lcore poll
busyness, we
simply call the telemetry timestamping function with the number
of polls a
particular code section has processed, and count the number of
cycles we've
spent processing empty bursts. The more empty bursts we
encounter, the less
cycles we spend in "busy" state, and the less core poll busyness
will be
reported.

In order for all of the above to work without modifications to
the
application, the library code needs to be instrumented with calls
to the
lcore telemetry busyness timestamping function. The following
parts of DPDK
are instrumented with lcore telemetry calls:

- All major driver API's:
   - ethdev
   - cryptodev
   - compressdev
   - regexdev
   - bbdev
   - rawdev
   - eventdev
   - dmadev
- Some additional libraries:
   - ring
   - distributor

To avoid performance impact from having lcore telemetry support,
a global
variable is exported by EAL, and a call to timestamping function
is wrapped
into a macro, so that whenever telemetry is disabled, it only
takes one
additional branch and no function calls are performed. It is also
possible
to disable it at compile time by commenting out
RTE_LCORE_BUSYNESS from
build config.

This patch also adds a telemetry endpoint to report lcore poll
busyness, as
well as telemetry endpoints to enable/disable lcore telemetry. A
documentation entry has been added to the howto guides to explain
the usage
of the new telemetry endpoints and API.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

---
v3:
   * Fix missed renaming to poll busyness
   * Fix clang compilation
   * Fix arm compilation

v2:
   * Use rte_get_tsc_hz() to adjust the telemetry period
   * Rename to reflect polling busyness vs general busyness
   * Fix segfault when calling telemetry timestamp from an
unregistered
     non-EAL thread.
   * Minor cleanup
---
diff --git a/meson_options.txt b/meson_options.txt
index 7c220ad68d..725b851f69 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -20,6 +20,8 @@ option('enable_driver_sdk', type: 'boolean',
value: false, description:
         'Install headers to build drivers.')
  option('enable_kmods', type: 'boolean', value: false,
description:
         'build kernel modules')
+option('enable_lcore_poll_busyness', type: 'boolean', value:
true, description:
+       'enable collection of lcore poll busyness telemetry')
IMO, All fastpath features should be opt-in. i.e default should be
false.
For the trace fastpath related changes, We have done the similar
thing
even though it cost additional one cycle for disabled trace points

We do need to consider runtime and build defaults differently,
though.
Since this has also runtime enabling, I think having build-time
enabling
true as default is ok, so long as the runtime enabling is false
(assuming
no noticable overhead when the feature is disabled.)
I was talking about buildtime only. "enable_trace_fp" meson option
selected as
false as default.
Agree. "enable_lcore_poll_busyness" is in the fast path, so it should
follow the design pattern of "enable_trace_fp".

+1 to making this opt-in. However, I'd lean more towards having the
buildtime option enabled and the runtime option disabled by default.
There is no measurable impact cause by the extra branch (the check for
enabled/disabled in the macro) by disabling at runtime, and we gain the
benefit of avoiding a recompile to enable it later.
The exact same thing could be said about "enable_trace_fp"; however, the development effort was put into separating it from "enable_trace", so it could be disabled by default.

Your patch is unlikely to get approved if you don't follow the "enable_trace_fp" design pattern as suggested.


            

              
If the concern is enabling on generic distros then distro generic
config can opt in this

/Bruce
@Kevin, are you considering a roadmap for using
RTE_LCORE_TELEMETRY_TIMESTAMP() for other purposes? Otherwise, it
should also be renamed to indicate that it is part of the "poll
busyness" telemetry.

No further purposes are planned for this macro, I'll rename it in the
next revision.
OK. Thank you.

Also, there's a new discussion about EAL bloat [1]. Perhaps I'm stretching it here, but it would be nice if your library was made a separate library, instead of part of the EAL library. (Since this kind of feature is not new to the EAL, I will categorize this suggestion as "nice to have", not "must have".)

[1] http://inbox.dpdk.org/dev/2594603.Isy0gbHreE@thomas/T/

I was actually discussing this with Kevin and Dave H. on Friay, and trying
to make this a separate library is indeed a big stretch. :-)

From that discussion, the key point/gap is that we are really missing a
clean way of providing undefs or macro fallbacks for when a library is just
not present. For example, if this was a separate library we would gain a
number of advantages e.g. no need for separate enable/disable flag, but the
big disadvantage is that every header include for it, and every reference
to the macros used in that header need to be surrounded by big ugly ifdefs.

For now, adding this into EAL is the far more practical approach, since it
means that even if support for the feature is disabled at build time the
header is still available to provide the appropriate no-op macros.
We can make the library always available with different implementations
based on a build option.
But is it a good idea to have a different behaviour based on build option?
Why not making it a runtime option?
Can the performance hit be cached in some way?

The patches currently include runtime options to enable/disable the feature via API and via telemetry endpoints. We have run performance tests and have failed to measure any performance impact with the feature runtime disabled.

We added the buildtime option following previous feedback where an absolute guarantee is required that performance would not be affected by the addition of this feature. To avoid clutter in the meson options, we could either a) remove the buildtime option, or b) allow a CFLAG to disable the feature rather than an explicit buildtime option. Either way, they would only serve a reassurance purpose.