DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
To: Kevin Laatz <kevin.laatz@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
	Conor Walsh <conor.walsh@intel.com>,
	David Hunt <david.hunt@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Nicolas Chautru <nicolas.chautru@intel.com>,
	Fan Zhang <roy.fan.zhang@intel.com>,
	Ashish Gupta <ashish.gupta@marvell.com>,
	Akhil Goyal <gakhil@marvell.com>,
	Fengchengwen <fengchengwen@huawei.com>,
	Ray Kinsella <mdr@ashroe.eu>,
	Thomas Monjalon <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@xilinx.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	Jerin Jacob <jerinj@marvell.com>,
	Sachin Saxena <sachin.saxena@oss.nxp.com>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	Ori Kam <orika@nvidia.com>,
	Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Subject: Re: [PATCH v7 1/4] eal: add lcore poll busyness telemetry
Date: Sat, 1 Oct 2022 15:17:26 +0100	[thread overview]
Message-ID: <a14449f3-f679-706b-ce6e-c8b2efa1794b@yandex.ru> (raw)
In-Reply-To: <ba69465b-d878-548f-d3c0-fb1da8340d08@intel.com>


>> Hi Kevin,
>>
>>>>> Currently, there is no way to measure lcore poll busyness in a 
>>>>> passive way,
>>>>> without any modifications to the application. This patch adds a new 
>>>>> EAL API
>>>>> that will be able to passively track core polling busyness.
>>>>>
>>>>> The poll busyness is calculated by relying on the fact that most 
>>>>> DPDK API's
>>>>> will poll for work (packets, completions, eventdev events, etc). Empty
>>>>> polls can be counted as "idle", while non-empty polls can be 
>>>>> counted as
>>>>> busy. To measure lcore poll busyness, we simply call the telemetry
>>>>> timestamping function with the number of polls a particular code 
>>>>> section
>>>>> has processed, and count the number of cycles we've spent 
>>>>> processing empty
>>>>> bursts. The more empty bursts we encounter, the less cycles we 
>>>>> spend in
>>>>> "busy" state, and the less core poll busyness will be reported.
>>>>>
>>>>> In order for all of the above to work without modifications to the
>>>>> application, the library code needs to be instrumented with calls 
>>>>> to the
>>>>> lcore telemetry busyness timestamping function. The following parts 
>>>>> of DPDK
>>>>> are instrumented with lcore poll busyness timestamping calls:
>>>>>
>>>>> - All major driver API's:
>>>>>     - ethdev
>>>>>     - cryptodev
>>>>>     - compressdev
>>>>>     - regexdev
>>>>>     - bbdev
>>>>>     - rawdev
>>>>>     - eventdev
>>>>>     - dmadev
>>>>> - Some additional libraries:
>>>>>     - ring
>>>>>     - distributor
>>>>>
>>>>> To avoid performance impact from having lcore telemetry support, a 
>>>>> global
>>>>> variable is exported by EAL, and a call to timestamping function is 
>>>>> wrapped
>>>>> into a macro, so that whenever telemetry is disabled, it only takes 
>>>>> one
>>>>> additional branch and no function calls are performed. It is 
>>>>> disabled at
>>>>> compile time by default.
>>>>>
>>>>> This patch also adds a telemetry endpoint to report lcore poll 
>>>>> busyness, as
>>>>> well as telemetry endpoints to enable/disable lcore telemetry. A
>>>>> documentation entry has been added to the howto guides to explain 
>>>>> the usage
>>>>> of the new telemetry endpoints and API.
>>>> As was already mentioned  by other reviewers, it would be much better
>>>> to let application itself decide when it is idle and when it is busy.
>>>> With current approach even for constant polling run-to-completion 
>>>> model there
>>>> are plenty of opportunities to get things wrong and provide 
>>>> misleading statistics.
>>>> My special concern - inserting it into ring dequeue code.
>>>> Ring is used for various different things, not only pass packets 
>>>> between threads (mempool, etc.).
>>>> Blindly assuming that ring dequeue returns empty means idle cycles 
>>>> seams wrong to me.
>>>> Which make me wonder should we really hard-code these calls into 
>>>> DPDK core functions?
>>>> If you still like to introduce such stats, might be better to 
>>>> implement it via callback mechanism.
>>>> As I remember nearly all our drivers (net, crypto, etc.) do support it.
>>>> That way our generic code   will remain unaffected, plus user will 
>>>> have ability to enable/disable
>>>> it on a per device basis.
>>> Thanks for your feedback, Konstantin.
>>>
>>> You are right in saying that this approach won't be 100% suitable for
>>> all use-cases, but should be suitable for the majority of applications.
>> First of all - could you explain how did you measure what is the 
>> 'majority' of DPDK applications?
>> And how did you conclude that it definitely work for all the apps in 
>> that 'majority'?
>> Second what bother me with that approach - I don't see s clear and 
>> deterministic way
>> for the user to understand would that stats work properly for his app 
>> or not.
>> (except manually ananlyzing his app code).
> 
> All of the DPDK example applications we've tested with (l2fwd, l3fwd + 
> friends, testpmd, distributor, dmafwd) report lcore poll busyness and 
> respond to changing traffic rates etc. We've also compared the reported 
> busyness to similar metrics reported by other projects such as VPP and 
> OvS, and found the reported busyness matches with a difference of +/- 
> 1%. In addition to the DPDK example applications, we've have shared our 
> plans with end customers and they have confirmed that the design should 
> work with their applications.

I am sure l3fwd and testpmd should be ok, I am talking about
something more complicated/unusual.
Below are few examples on top of my head when I think your approach
will generate invalid stats, feel free to correct me, if I am wrong.

1) App doing some sort of bonding itslef, i.e:

struct rte_mbuf pkts[N*2];
k = rte_eth_rx_burst(p0, q0, pkts, N);
n = rte_eth_rx_burst(p1, q1, pkts + k, N);

/*process all packets from both ports at once */
if (n + k != 0)
    process_pkts(pkts, n + k);

Now, as I understand, if n==0, then all cycles spent
in process_pkts() will be accounted as idle.

2) App doing something similar to what pdump library does
(creates a copy of a packet and sends it somewhere).

n =rte_eth_rx_burst(p0, q0, &pkt, 1);
if (n != 0) {
   dup_pkt = rte_pktmbuf_copy(pkt, dup_mp, ...);
   if (dup_pkt != NULL)
      process_dup_pkt(dup_pkt);
   process_pkt(pkt);
}

that relates to ring discussion below:
if there are no mbufs in dup_mp, then ring_deque() will fail
and process_pkt() will be accounted as idle.

3) App dequeues from ring in a bit of unusual way:

/* idle spin loop */
while ((n = rte_ring_count(ring)) == 0)
   ret_pause();

n = rte_ring_dequeue_bulk(ring, pkts, n, NULL);
if (n != 0)
   process_pkts(pkts, n);

here, we can end-up accounting cycles spent in
idle spin loop as busy.


4) Any thread that generates TX traffic on it's own
(something like testpmd tx_only fwd mode)

5) Any thread that depends on both dequeue and enqueue:

n = rte_ring_dequeue_burst(in_ring, pkts, n, ..);
...

/* loop till all packets are sent out successfully */
while(rte_ring_enqueue_bulk(out_ring, pkts, n, NULL) == 0)
    rte_pause();

Now, if n > 0, all cycles spent in enqueue() will be accounted
as 'busy', though from my perspective they probably should
be considered as 'idle'.


Also I expect some problems when packet processing is done inside
rx callbacks, but that probably can be easily fixed.


> 
>>> It's worth keeping in mind that this feature is compile-time disabled by
>>> default, so there is no impact to any application/user that does not
>>> wish to use this, for example applications where this type of busyness
>>> is not useful, or for applications that already use other mechanisms to
>>> report similar telemetry.
>> Not sure that adding in new compile-time option disabled by default is 
>> a good thing...
>> For me it would be much more preferable if we'll go through a more 
>> 'standard' way here:
>> a) define clear API to enable/disable/collect/report such type of stats.
>> b) use some of our sample apps to demonstrate how to use it properly 
>> with user-specific code.
>> c) if needed implement some 'silent' stats collection for limited 
>> scope of apps via callbacks -
>> let say for run-to-completion apps that do use ether and crypto devs 
>> only.
> 
> With the compile-time option, its just one build flag for lots of 
> applications to silently benefit from this.

There could be a lot of useful and helpfull stats
that user would like to collect (idle/busy, processing latency, etc.).
But, if for each such case we will hard-code new stats collection
into our fast data-path code, then very soon it will become
completely bloated and unmaintainable.
I think we need some generic approach for such extra stats collection.
Callbacks could be one way, Jerin in another mail suggested using 
existing trace-point hooks, might be it worth to explore it further.

> 
>>   However, the upside for applications that do
>>> wish to use this is that there are no code changes required (for the
>>> most part), the feature simply needs to be enabled at compile-time via
>>> the meson option.
>>>
>>> In scenarios where contextual awareness of the application is needed in
>>> order to report more accurate "busyness", the
>>> "RTE_LCORE_POLL_BUSYNESS_TIMESTAMP(n)" macro can be used to mark
>>> sections of code as "busy" or "idle". This way, the application can
>>> assume control of determining the poll busyness of its lcores while
>>> leveraging the telemetry hooks adding in this patchset.
>>>
>>> We did initially consider implementing this via callbacks, however we
>>> found this approach to have 2 main drawbacks:
>>> 1. Application changes are required for all applications wanting to
>>> report this telemetry - rather than the majority getting it for free.
>> Didn't get it - why callbacks approach would require user-app changes?
>> In other situations - rte_power callbacks, pdump, etc. it works 
>> transparent to
>> user-leve code.
>> Why it can't be done here in a similar way?
> 
>  From my understanding, the callbacks would need to be registered by the 
> application at the very least (and the callback would have to be 
> registered per device/pmd/lib).

Callbacks can be registered by library itself.
AFAIK, latenc-ystats, power and pdump libraries - all use similar 
approach. user calls something like xxx_stats_enable() and then library 
can iterate over all available devices and setup necessary callbacks.
same for xxx_stats_disable().

>>
>>> 2. Ring does not have callback support, meaning pipelined applications
>>> could not report lcore poll busyness telemetry with this approach.
>> That's another big concern that I have:
>> Why you consider that all rings will be used for a pipilines between 
>> threads and should
>> always be accounted by your stats?
>> They could be used for dozens different purposes.
>> What if that ring is used for mempool, and ring_dequeue() just means 
>> we try to allocate
>> an object from the pool? In such case, why failing to allocate an 
>> object should mean
>> start of new 'idle cycle'?
> 
> Another approach could be taken here if the mempool interactions are of 
> concern.
> 
>  From our understanding, mempool operations use the "_bulk" APIs, 
> whereas polling operations use the "_burst" APIs. Would only 
> timestamping on the "_burst" APIs be better here? That way the mempool 
> interactions won't be counted towards the busyness.

Well, it would help to solve one particular case,
but in general I still think it is incomplete and error-prone.
What if pipeline app will use ring_count/ring_dequeue_bulk(),
or even ZC ring API?
What if app will use something different from rte_ring to pass
packets between threads/processes?
As I said before, without some clues from the app, it is probably
not possible to collect such stats in a proper way.


> Including support for pipelined applications using rings is key for a 
> number of usecases, this was highlighted as part of the customer 
> feedback when we shared the design.
> 
>>
>>> Eventdev is another driver which would be completely missed with this
>>> approach.
>> Ok, I see two ways here:
>> - implement CB support for eventdev.
>> -meanwhile clearly document that this stats are not supported for 
>> eventdev  scenarios (yet).


  parent reply	other threads:[~2022-10-01 14:17 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 13:12 [PATCH v1 1/2] eal: add lcore " Anatoly Burakov
2022-07-15 13:12 ` [PATCH v1 2/2] eal: add cpuset lcore telemetry entries Anatoly Burakov
2022-07-15 13:35 ` [PATCH v1 1/2] eal: add lcore busyness telemetry Burakov, Anatoly
2022-07-15 13:46 ` Jerin Jacob
2022-07-15 14:11   ` Bruce Richardson
2022-07-15 14:18   ` Burakov, Anatoly
2022-07-15 22:13 ` Morten Brørup
2022-07-16 14:38   ` Thomas Monjalon
2022-07-17  3:10   ` Honnappa Nagarahalli
2022-07-17  9:56     ` Morten Brørup
2022-07-18  9:43       ` Burakov, Anatoly
2022-07-18 10:59         ` Morten Brørup
2022-07-19 12:20           ` Thomas Monjalon
2022-07-18 15:46         ` Stephen Hemminger
2022-08-24 16:24 ` [PATCH v2 0/3] Add lcore poll " Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 1/3] eal: add " Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-24 16:24   ` [PATCH v2 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-08-25  7:47   ` [PATCH v2 0/3] Add lcore poll busyness telemetry Morten Brørup
2022-08-25 10:53     ` Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 " Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 1/3] eal: add " Kevin Laatz
2022-08-26  7:05     ` Jerin Jacob
2022-08-26  8:07       ` Bruce Richardson
2022-08-26  8:16         ` Jerin Jacob
2022-08-26  8:29           ` Morten Brørup
2022-08-26 15:27             ` Kevin Laatz
2022-08-26 15:46               ` Morten Brørup
2022-08-29 10:41                 ` Bruce Richardson
2022-08-29 10:53                   ` Thomas Monjalon
2022-08-29 12:36                     ` Kevin Laatz
2022-08-29 12:49                       ` Morten Brørup
2022-08-29 13:37                         ` Kevin Laatz
2022-08-29 13:44                           ` Morten Brørup
2022-08-29 14:21                             ` Kevin Laatz
2022-08-29 11:22                   ` Morten Brørup
2022-08-26 22:06     ` Mattias Rönnblom
2022-08-29  8:23       ` Bruce Richardson
2022-08-29 13:16       ` Kevin Laatz
2022-08-30 10:26       ` Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-25 15:28   ` [PATCH v3 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 1/3] eal: add " Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-01 14:39   ` [PATCH v4 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 1/3] eal: add " Kevin Laatz
2022-09-03 13:33     ` Jerin Jacob
2022-09-06  9:37       ` Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-02 15:58   ` [PATCH v5 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 1/4] eal: add " Kevin Laatz
2022-09-13 13:48     ` Morten Brørup
2022-09-13 13:19   ` [PATCH v6 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-13 13:19   ` [PATCH v6 4/4] doc: add howto guide " Kevin Laatz
2022-09-14  9:29 ` [PATCH v7 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-14  9:29   ` [PATCH v7 1/4] eal: add " Kevin Laatz
2022-09-14 14:30     ` Stephen Hemminger
2022-09-16 12:35       ` Kevin Laatz
2022-09-19 10:19     ` Konstantin Ananyev
2022-09-22 17:14       ` Kevin Laatz
2022-09-26  9:37         ` Konstantin Ananyev
2022-09-29 12:41           ` Kevin Laatz
2022-09-30 12:32             ` Jerin Jacob
2022-10-01 14:17             ` Konstantin Ananyev [this message]
2022-10-03 20:02               ` Mattias Rönnblom
2022-10-04  9:15                 ` Morten Brørup
2022-10-04 11:57                   ` Bruce Richardson
2022-10-04 14:26                     ` Mattias Rönnblom
2022-10-04 23:30                     ` Konstantin Ananyev
2022-09-30 22:13     ` Mattias Rönnblom
2022-09-14  9:29   ` [PATCH v7 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-14  9:29   ` [PATCH v7 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-30 22:20     ` Mattias Rönnblom
2022-09-14  9:29   ` [PATCH v7 4/4] doc: add howto guide " Kevin Laatz
2022-09-14 14:33   ` [PATCH v7 0/4] Add lcore poll busyness telemetry Stephen Hemminger
2022-09-16 12:35     ` Kevin Laatz
2022-09-16 14:10       ` Kevin Laatz
2022-10-05 13:44   ` Kevin Laatz
2022-10-06 13:25     ` Morten Brørup
2022-10-06 15:26       ` Mattias Rönnblom
2022-10-10 15:22         ` Morten Brørup
2022-10-10 17:38           ` Mattias Rönnblom
2022-10-12 12:25             ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a14449f3-f679-706b-ce6e-c8b2efa1794b@yandex.ru \
    --to=konstantin.v.ananyev@yandex.ru \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ashish.gupta@marvell.com \
    --cc=bruce.richardson@intel.com \
    --cc=conor.walsh@intel.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@xilinx.com \
    --cc=gakhil@marvell.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=kevin.laatz@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mdr@ashroe.eu \
    --cc=nicolas.chautru@intel.com \
    --cc=orika@nvidia.com \
    --cc=roy.fan.zhang@intel.com \
    --cc=sachin.saxena@oss.nxp.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).