DPDK patches and discussions
 help / color / mirror / Atom feed
From: Kevin Laatz <kevin.laatz@intel.com>
To: "Mattias Rönnblom" <hofors@lysator.liu.se>, dev@dpdk.org
Cc: <anatoly.burakov@intel.com>, Conor Walsh <conor.walsh@intel.com>,
	"David Hunt" <david.hunt@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Nicolas Chautru <nicolas.chautru@intel.com>,
	Fan Zhang <roy.fan.zhang@intel.com>,
	Ashish Gupta <ashish.gupta@marvell.com>,
	"Akhil Goyal" <gakhil@marvell.com>,
	Chengwen Feng <fengchengwen@huawei.com>,
	"Ray Kinsella" <mdr@ashroe.eu>,
	Thomas Monjalon <thomas@monjalon.net>,
	"Ferruh Yigit" <ferruh.yigit@xilinx.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	Jerin Jacob <jerinj@marvell.com>,
	"Sachin Saxena" <sachin.saxena@oss.nxp.com>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	 Ori Kam <orika@nvidia.com>,
	Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Subject: Re: [PATCH v3 1/3] eal: add lcore poll busyness telemetry
Date: Mon, 29 Aug 2022 14:16:34 +0100	[thread overview]
Message-ID: <0813cdfe-47ac-45ce-8f33-2ed0d4f19d45@intel.com> (raw)
In-Reply-To: <af66f6ee-1a61-9663-a179-322ad2bb71a6@lysator.liu.se>

On 26/08/2022 23:06, Mattias Rönnblom wrote:
> On 2022-08-25 17:28, Kevin Laatz wrote:
>> From: Anatoly Burakov <anatoly.burakov@intel.com>
>>
>> Currently, there is no way to measure lcore poll busyness in a 
>> passive way,
>> without any modifications to the application. This patch adds a new 
>> EAL API
>> that will be able to passively track core polling busyness.
>
> There's no generic way, but the DSW event device keep tracks of lcore 
> utilization (i.e., the fraction of cycles used to perform actual work, 
> as opposed to just polling empty queues), and it does some with the 
> same basic principles as, from what it seems after a quick look, used 
> in this patch.
>
>>
>> The poll busyness is calculated by relying on the fact that most DPDK 
>> API's
>> will poll for packets. Empty polls can be counted as "idle", while
>
> Lcore worker threads poll for work. Packets, timeouts, completions, 
> event device events, etc.

Yes, the wording was too restrictive here - the patch includes changes 
to drivers and libraries such as dmadev, eventdev, ring etc that poll 
for work and would want to mark it as "idle" or "busy".


>
>> non-empty polls can be counted as busy. To measure lcore poll 
>> busyness, we
>
> I guess what is meant here is that cycles spent after non-empty polls 
> can be counted as busy (useful) cycles? Potentially including the 
> cycles spent for the actual poll operation. ("Poll busyness" is a very 
> vague term, in my opionion.)
>
> Similiarly, cycles spent after an empty poll would not be counted.

Correct, the generic functionality works this way. Any cycles between an 
"busy poll" and the next "idle poll" will be counted as busy/useful work 
(and vice versa).


>
>> simply call the telemetry timestamping function with the number of 
>> polls a
>> particular code section has processed, and count the number of cycles 
>> we've
>> spent processing empty bursts. The more empty bursts we encounter, 
>> the less
>> cycles we spend in "busy" state, and the less core poll busyness will be
>> reported.
>>
>
> Is this the same scheme as DSW? When a non-zero burst in idle state 
> means a transition from the idle to busy? And a zero burst poll in 
> busy state means a transition from idle to busy?
>
> The issue with this scheme, is that you might potentially end up with 
> a state transition for every iteration of the application's main loop, 
> if packets (or other items of work) only comes in on one of the 
> lcore's potentially many RX queues (or other input queues, such as 
> eventdev ports). That means a rdtsc for every loop, which isn't too 
> bad, but still might be noticable.
>
> An application that gather items of work from multiple source before 
> actually doing anything breaks this model. For example, consider a 
> lcore worker owning two RX queues, performing rte_eth_rx_burst() on 
> both, before attempt to process any of the received packets. If the 
> last poll is empty, the cycles spent will considered idle, even though 
> they were busy.
>
> A lcore worker might also decide to poll the same RX queue multiple 
> times (until it hits an empty poll, or reaching some high upper 
> bound), before initating processing of the packets.

Yes, more complex applications will need to be modified to gain a more 
fine-grained busyness metric. In order to achieve this level of 
accuracy, application context is required. The 
'RTE_LCORE_POLL_BUSYNESS_TIMESTAMP()' macro can be used within the 
application to mark sections as "busy" or "not busy" to do so. Using 
your example above, the application could keep track of multiple bursts 
(whether they have work or not) and call the macro before initiating the 
processing to signal that there is, in fact, work to be done.

There's a section in the documentation update in this patchset that 
describes it. It might need more work if its not clear :-)


>
> I didn't read your code in detail, so I might be jumping to conclusions.
>
>> In order for all of the above to work without modifications to the
>> application, the library code needs to be instrumented with calls to the
>> lcore telemetry busyness timestamping function. The following parts 
>> of DPDK
>> are instrumented with lcore telemetry calls:
>>
>> - All major driver API's:
>>    - ethdev
>>    - cryptodev
>>    - compressdev
>>    - regexdev
>>    - bbdev
>>    - rawdev
>>    - eventdev
>>    - dmadev
>> - Some additional libraries:
>>    - ring
>>    - distributor
>
> In the past, I've suggested this kind of functionality should go into 
> the service framework instead, with the service function explicitly 
> signaling wheter or not the cycles spent on something useful or not.
>
> That seems to me like a more straight-forward and more accurate 
> solution, but does require the application to deploy everything as 
> services, and also requires a change of the service function signature.
>
>>
>> To avoid performance impact from having lcore telemetry support, a 
>> global
>> variable is exported by EAL, and a call to timestamping function is 
>> wrapped
>> into a macro, so that whenever telemetry is disabled, it only takes one
>
> Use an static inline function if you don't need the additional 
> expressive power of a macro.
>
> I suggest you also mention the performance implications, when this 
> function is enabled.

Sure, I can add a note in the next revision.


>
>> additional branch and no function calls are performed. It is also 
>> possible
>> to disable it at compile time by commenting out RTE_LCORE_BUSYNESS from
>> build config.
>>
>> This patch also adds a telemetry endpoint to report lcore poll 
>> busyness, as
>> well as telemetry endpoints to enable/disable lcore telemetry. A
>> documentation entry has been added to the howto guides to explain the 
>> usage
>> of the new telemetry endpoints and API.
>>
>
> Should there really be a dependency from the EAL to the telemetry 
> library? A cycle. Maybe some dependency inversion would be in order? 
> The telemetry library could instead register an interest in getting 
> busy/idle cycles reports from lcores.
>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Conor Walsh <conor.walsh@intel.com>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>
>> ---
>> v3:
>>    * Fix missed renaming to poll busyness
>>    * Fix clang compilation
>>    * Fix arm compilation
>>
>> v2:
>>    * Use rte_get_tsc_hz() to adjust the telemetry period
>>    * Rename to reflect polling busyness vs general busyness
>>    * Fix segfault when calling telemetry timestamp from an unregistered
>>      non-EAL thread.
>>    * Minor cleanup
>> ---
>>   config/meson.build                          |   1 +
>>   config/rte_config.h                         |   1 +
>>   lib/bbdev/rte_bbdev.h                       |  17 +-
>>   lib/compressdev/rte_compressdev.c           |   2 +
>>   lib/cryptodev/rte_cryptodev.h               |   2 +
>>   lib/distributor/rte_distributor.c           |  21 +-
>>   lib/distributor/rte_distributor_single.c    |  14 +-
>>   lib/dmadev/rte_dmadev.h                     |  15 +-
>>   lib/eal/common/eal_common_lcore_telemetry.c | 293 ++++++++++++++++++++
>>   lib/eal/common/meson.build                  |   1 +
>>   lib/eal/include/rte_lcore.h                 |  80 ++++++
>>   lib/eal/meson.build                         |   3 +
>>   lib/eal/version.map                         |   7 +
>>   lib/ethdev/rte_ethdev.h                     |   2 +
>>   lib/eventdev/rte_eventdev.h                 |  10 +-
>>   lib/rawdev/rte_rawdev.c                     |   6 +-
>>   lib/regexdev/rte_regexdev.h                 |   5 +-
>>   lib/ring/rte_ring_elem_pvt.h                |   1 +
>>   meson_options.txt                           |   2 +
>>   19 files changed, 459 insertions(+), 24 deletions(-)
>>   create mode 100644 lib/eal/common/eal_common_lcore_telemetry.c
>>
<snip>
>> diff --git a/lib/eal/common/eal_common_lcore_telemetry.c 
>> b/lib/eal/common/eal_common_lcore_telemetry.c
>> new file mode 100644
>> index 0000000000..bba0afc26d
>> --- /dev/null
>> +++ b/lib/eal/common/eal_common_lcore_telemetry.c
>> @@ -0,0 +1,293 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2010-2014 Intel Corporation
>> + */
>> +
>> +#include <unistd.h>
>> +#include <limits.h>
>> +#include <string.h>
>> +
>> +#include <rte_common.h>
>> +#include <rte_cycles.h>
>> +#include <rte_errno.h>
>> +#include <rte_lcore.h>
>> +
>> +#ifdef RTE_LCORE_POLL_BUSYNESS
>> +#include <rte_telemetry.h>
>> +#endif
>> +
>> +int __rte_lcore_telemetry_enabled;
>
> Is "telemetry" really the term to use here? Isn't this just another 
> piece of statistics? It can be used for telemetry, or in some other 
> fashion.
>
> (Use bool not int.)

Can rename to '__rte_lcore_stats_enabled' in next revision.


>
>> +
>> +#ifdef RTE_LCORE_POLL_BUSYNESS
>> +
>> +struct lcore_telemetry {
>> +    int poll_busyness;
>> +    /**< Calculated poll busyness (gets set/returned by the API) */
>> +    int raw_poll_busyness;
>> +    /**< Calculated poll busyness times 100. */
>> +    uint64_t interval_ts;
>> +    /**< when previous telemetry interval started */
>> +    uint64_t empty_cycles;
>> +    /**< empty cycle count since last interval */
>> +    uint64_t last_poll_ts;
>> +    /**< last poll timestamp */
>> +    bool last_empty;
>> +    /**< if last poll was empty */
>> +    unsigned int contig_poll_cnt;
>> +    /**< contiguous (always empty/non empty) poll counter */
>> +} __rte_cache_aligned;
>> +
>> +static struct lcore_telemetry *telemetry_data;
>> +
>> +#define LCORE_POLL_BUSYNESS_MAX 100
>> +#define LCORE_POLL_BUSYNESS_NOT_SET -1
>> +#define LCORE_POLL_BUSYNESS_MIN 0
>> +
>> +#define SMOOTH_COEFF 5
>> +#define STATE_CHANGE_OPT 32
>> +
>> +static void lcore_config_init(void)
>> +{
>> +    int lcore_id;
>> +
>> +    telemetry_data = calloc(RTE_MAX_LCORE, sizeof(telemetry_data[0]));
>> +    if (telemetry_data == NULL)
>> +        rte_panic("Could not init lcore telemetry data: Out of 
>> memory\n");
>> +
>> +    RTE_LCORE_FOREACH(lcore_id) {
>> +        struct lcore_telemetry *td = &telemetry_data[lcore_id];
>> +
>> +        td->interval_ts = 0;
>> +        td->last_poll_ts = 0;
>> +        td->empty_cycles = 0;
>> +        td->last_empty = true;
>> +        td->contig_poll_cnt = 0;
>> +        td->poll_busyness = LCORE_POLL_BUSYNESS_NOT_SET;
>> +        td->raw_poll_busyness = 0;
>> +    }
>> +}
>> +
>> +int rte_lcore_poll_busyness(unsigned int lcore_id)
>> +{
>> +    const uint64_t active_thresh = rte_get_tsc_hz() * 
>> RTE_LCORE_POLL_BUSYNESS_PERIOD_MS;
>> +    struct lcore_telemetry *tdata;
>> +
>> +    if (lcore_id >= RTE_MAX_LCORE)
>> +        return -EINVAL;
>> +    tdata = &telemetry_data[lcore_id];
>> +
>> +    /* if the lcore is not active */
>> +    if (tdata->interval_ts == 0)
>> +        return LCORE_POLL_BUSYNESS_NOT_SET;
>> +    /* if the core hasn't been active in a while */
>> +    else if ((rte_rdtsc() - tdata->interval_ts) > active_thresh)
>> +        return LCORE_POLL_BUSYNESS_NOT_SET;
>> +
>> +    /* this core is active, report its poll busyness */
>> +    return telemetry_data[lcore_id].poll_busyness;
>> +}
>> +
>> +int rte_lcore_poll_busyness_enabled(void)
>> +{
>> +    return __rte_lcore_telemetry_enabled;
>> +}
>> +
>> +void rte_lcore_poll_busyness_enabled_set(int enable)
>
> Use bool.
>
>> +{
>> +    __rte_lcore_telemetry_enabled = !!enable;
>
> !!Another reason to use bool!! :)
>
> Are you allowed to call this function during operation, you'll need a 
> atomic store here (and an atomic load on the read side).

Ack


>
>> +
>> +    if (!enable)
>> +        lcore_config_init();
>> +}
>> +
>> +static inline int calc_raw_poll_busyness(const struct 
>> lcore_telemetry *tdata,
>> +                    const uint64_t empty, const uint64_t total)
>> +{
>> +    /*
>> +     * we don't want to use floating point math here, but we want 
>> for our poll
>> +     * busyness to react smoothly to sudden changes, while still 
>> keeping the
>> +     * accuracy and making sure that over time the average follows 
>> poll busyness
>> +     * as measured just-in-time. therefore, we will calculate the 
>> average poll
>> +     * busyness using integer math, but shift the decimal point two 
>> places
>> +     * to the right, so that 100.0 becomes 10000. this allows us to 
>> report
>> +     * integer values (0..100) while still allowing ourselves to 
>> follow the
>> +     * just-in-time measurements when we calculate our averages.
>> +     */
>> +    const int max_raw_idle = LCORE_POLL_BUSYNESS_MAX * 100;
>> +
>
> Why not just store/manage the number of busy (or idle, or both) 
> cycles? Then the user can decide what time period to average over, to 
> what extent the lcore utilization from previous periods should be 
> factored in, etc.

There's an option 'RTE_LCORE_POLL_BUSYNESS_PERIOD_MS' added to 
rte_config.h which would allow the user to define the time period over 
which the utilization should be reported. We only do this calculation if 
that time interval has elapsed.


>
> In DSW, I initially presented only a load statistic (which averaged 
> over 250 us, with some contribution from previous period). I later 
> came to realize that just exposing the number of busy cycles left the 
> calling application much more options. For example, to present the 
> average load during 1 s, you needed to have some control thread 
> sampling the load statistic during that time period, as opposed to 
> when the busy cycles statistic was introduced, it just had to read 
> that value twice (at the beginning of the period, and at the end), and 
> compared it will the amount of wallclock time passed.
>
<snip>

  parent reply	other threads:[~2022-08-29 13:16 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 13:12 [PATCH v1 1/2] eal: add lcore " Anatoly Burakov
2022-07-15 13:12 ` [PATCH v1 2/2] eal: add cpuset lcore telemetry entries Anatoly Burakov
2022-07-15 13:35 ` [PATCH v1 1/2] eal: add lcore busyness telemetry Burakov, Anatoly
2022-07-15 13:46 ` Jerin Jacob
2022-07-15 14:11   ` Bruce Richardson
2022-07-15 14:18   ` Burakov, Anatoly
2022-07-15 22:13 ` Morten Brørup
2022-07-16 14:38   ` Thomas Monjalon
2022-07-17  3:10   ` Honnappa Nagarahalli
2022-07-17  9:56     ` Morten Brørup
2022-07-18  9:43       ` Burakov, Anatoly
2022-07-18 10:59         ` Morten Brørup
2022-07-19 12:20           ` Thomas Monjalon
2022-07-18 15:46         ` Stephen Hemminger
2022-08-24 16:24 ` [PATCH v2 0/3] Add lcore poll " Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 1/3] eal: add " Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-08-25  7:47   ` [PATCH v2 0/3] Add lcore poll busyness telemetry Morten Brørup
2022-08-25 10:53     ` Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 " Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 1/3] eal: add " Kevin Laatz
2022-08-26  7:05     ` Jerin Jacob
2022-08-26  8:07       ` Bruce Richardson
2022-08-26  8:16         ` Jerin Jacob
2022-08-26  8:29           ` Morten Brørup
2022-08-26 15:27             ` Kevin Laatz
2022-08-26 15:46               ` Morten Brørup
2022-08-29 10:41                 ` Bruce Richardson
2022-08-29 10:53                   ` Thomas Monjalon
2022-08-29 12:36                     ` Kevin Laatz
2022-08-29 12:49                       ` Morten Brørup
2022-08-29 13:37                         ` Kevin Laatz
2022-08-29 13:44                           ` Morten Brørup
2022-08-29 14:21                             ` Kevin Laatz
2022-08-29 11:22                   ` Morten Brørup
2022-08-26 22:06     ` Mattias Rönnblom
2022-08-29  8:23       ` Bruce Richardson
2022-08-29 13:16       ` Kevin Laatz [this message]
2022-08-30 10:26       ` Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 1/3] eal: add " Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 1/3] eal: add " Kevin Laatz
2022-09-03 13:33     ` Jerin Jacob
2022-09-06  9:37       ` Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 1/4] eal: add " Kevin Laatz
2022-09-13 13:48     ` Morten Brørup
2022-09-13 13:19   ` [PATCH v6 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 4/4] doc: add howto guide " Kevin Laatz
2022-09-14  9:29 ` [PATCH v7 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-14  9:29   ` [PATCH v7 1/4] eal: add " Kevin Laatz
2022-09-14 14:30     ` Stephen Hemminger
2022-09-16 12:35       ` Kevin Laatz
2022-09-19 10:19     ` Konstantin Ananyev
2022-09-22 17:14       ` Kevin Laatz
2022-09-26  9:37         ` Konstantin Ananyev
2022-09-29 12:41           ` Kevin Laatz
2022-09-30 12:32             ` Jerin Jacob
2022-10-01 14:17             ` Konstantin Ananyev
2022-10-03 20:02               ` Mattias Rönnblom
2022-10-04  9:15                 ` Morten Brørup
2022-10-04 11:57                   ` Bruce Richardson
2022-10-04 14:26                     ` Mattias Rönnblom
2022-10-04 23:30                     ` Konstantin Ananyev
2022-09-30 22:13     ` Mattias Rönnblom
2022-09-14  9:29   ` [PATCH v7 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-14  9:29   ` [PATCH v7 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-30 22:20     ` Mattias Rönnblom
2022-09-14  9:29   ` [PATCH v7 4/4] doc: add howto guide " Kevin Laatz
2022-09-14 14:33   ` [PATCH v7 0/4] Add lcore poll busyness telemetry Stephen Hemminger
2022-09-16 12:35     ` Kevin Laatz
2022-09-16 14:10       ` Kevin Laatz
2022-10-05 13:44   ` Kevin Laatz
2022-10-06 13:25     ` Morten Brørup
2022-10-06 15:26       ` Mattias Rönnblom
2022-10-10 15:22         ` Morten Brørup
2022-10-10 17:38           ` Mattias Rönnblom
2022-10-12 12:25             ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0813cdfe-47ac-45ce-8f33-2ed0d4f19d45@intel.com \
    --to=kevin.laatz@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ashish.gupta@marvell.com \
    --cc=bruce.richardson@intel.com \
    --cc=conor.walsh@intel.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@xilinx.com \
    --cc=gakhil@marvell.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=hofors@lysator.liu.se \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mdr@ashroe.eu \
    --cc=nicolas.chautru@intel.com \
    --cc=orika@nvidia.com \
    --cc=roy.fan.zhang@intel.com \
    --cc=sachin.saxena@oss.nxp.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).