From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
To: Kevin Laatz <kevin.laatz@intel.com>,
Konstantin Ananyev <konstantin.ananyev@huawei.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
Conor Walsh <conor.walsh@intel.com>,
David Hunt <david.hunt@intel.com>,
Bruce Richardson <bruce.richardson@intel.com>,
Nicolas Chautru <nicolas.chautru@intel.com>,
Fan Zhang <roy.fan.zhang@intel.com>,
Ashish Gupta <ashish.gupta@marvell.com>,
Akhil Goyal <gakhil@marvell.com>,
Fengchengwen <fengchengwen@huawei.com>,
Ray Kinsella <mdr@ashroe.eu>,
Thomas Monjalon <thomas@monjalon.net>,
Ferruh Yigit <ferruh.yigit@xilinx.com>,
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
Jerin Jacob <jerinj@marvell.com>,
Sachin Saxena <sachin.saxena@oss.nxp.com>,
Hemant Agrawal <hemant.agrawal@nxp.com>,
Ori Kam <orika@nvidia.com>,
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Subject: Re: [PATCH v7 1/4] eal: add lcore poll busyness telemetry
Date: Sat, 1 Oct 2022 15:17:26 +0100 [thread overview]
Message-ID: <a14449f3-f679-706b-ce6e-c8b2efa1794b@yandex.ru> (raw)
In-Reply-To: <ba69465b-d878-548f-d3c0-fb1da8340d08@intel.com>
>> Hi Kevin,
>>
>>>>> Currently, there is no way to measure lcore poll busyness in a
>>>>> passive way,
>>>>> without any modifications to the application. This patch adds a new
>>>>> EAL API
>>>>> that will be able to passively track core polling busyness.
>>>>>
>>>>> The poll busyness is calculated by relying on the fact that most
>>>>> DPDK API's
>>>>> will poll for work (packets, completions, eventdev events, etc). Empty
>>>>> polls can be counted as "idle", while non-empty polls can be
>>>>> counted as
>>>>> busy. To measure lcore poll busyness, we simply call the telemetry
>>>>> timestamping function with the number of polls a particular code
>>>>> section
>>>>> has processed, and count the number of cycles we've spent
>>>>> processing empty
>>>>> bursts. The more empty bursts we encounter, the less cycles we
>>>>> spend in
>>>>> "busy" state, and the less core poll busyness will be reported.
>>>>>
>>>>> In order for all of the above to work without modifications to the
>>>>> application, the library code needs to be instrumented with calls
>>>>> to the
>>>>> lcore telemetry busyness timestamping function. The following parts
>>>>> of DPDK
>>>>> are instrumented with lcore poll busyness timestamping calls:
>>>>>
>>>>> - All major driver API's:
>>>>> - ethdev
>>>>> - cryptodev
>>>>> - compressdev
>>>>> - regexdev
>>>>> - bbdev
>>>>> - rawdev
>>>>> - eventdev
>>>>> - dmadev
>>>>> - Some additional libraries:
>>>>> - ring
>>>>> - distributor
>>>>>
>>>>> To avoid performance impact from having lcore telemetry support, a
>>>>> global
>>>>> variable is exported by EAL, and a call to timestamping function is
>>>>> wrapped
>>>>> into a macro, so that whenever telemetry is disabled, it only takes
>>>>> one
>>>>> additional branch and no function calls are performed. It is
>>>>> disabled at
>>>>> compile time by default.
>>>>>
>>>>> This patch also adds a telemetry endpoint to report lcore poll
>>>>> busyness, as
>>>>> well as telemetry endpoints to enable/disable lcore telemetry. A
>>>>> documentation entry has been added to the howto guides to explain
>>>>> the usage
>>>>> of the new telemetry endpoints and API.
>>>> As was already mentioned by other reviewers, it would be much better
>>>> to let application itself decide when it is idle and when it is busy.
>>>> With current approach even for constant polling run-to-completion
>>>> model there
>>>> are plenty of opportunities to get things wrong and provide
>>>> misleading statistics.
>>>> My special concern - inserting it into ring dequeue code.
>>>> Ring is used for various different things, not only pass packets
>>>> between threads (mempool, etc.).
>>>> Blindly assuming that ring dequeue returns empty means idle cycles
>>>> seams wrong to me.
>>>> Which make me wonder should we really hard-code these calls into
>>>> DPDK core functions?
>>>> If you still like to introduce such stats, might be better to
>>>> implement it via callback mechanism.
>>>> As I remember nearly all our drivers (net, crypto, etc.) do support it.
>>>> That way our generic code will remain unaffected, plus user will
>>>> have ability to enable/disable
>>>> it on a per device basis.
>>> Thanks for your feedback, Konstantin.
>>>
>>> You are right in saying that this approach won't be 100% suitable for
>>> all use-cases, but should be suitable for the majority of applications.
>> First of all - could you explain how did you measure what is the
>> 'majority' of DPDK applications?
>> And how did you conclude that it definitely work for all the apps in
>> that 'majority'?
>> Second what bother me with that approach - I don't see s clear and
>> deterministic way
>> for the user to understand would that stats work properly for his app
>> or not.
>> (except manually ananlyzing his app code).
>
> All of the DPDK example applications we've tested with (l2fwd, l3fwd +
> friends, testpmd, distributor, dmafwd) report lcore poll busyness and
> respond to changing traffic rates etc. We've also compared the reported
> busyness to similar metrics reported by other projects such as VPP and
> OvS, and found the reported busyness matches with a difference of +/-
> 1%. In addition to the DPDK example applications, we've have shared our
> plans with end customers and they have confirmed that the design should
> work with their applications.
I am sure l3fwd and testpmd should be ok, I am talking about
something more complicated/unusual.
Below are few examples on top of my head when I think your approach
will generate invalid stats, feel free to correct me, if I am wrong.
1) App doing some sort of bonding itslef, i.e:
struct rte_mbuf pkts[N*2];
k = rte_eth_rx_burst(p0, q0, pkts, N);
n = rte_eth_rx_burst(p1, q1, pkts + k, N);
/*process all packets from both ports at once */
if (n + k != 0)
process_pkts(pkts, n + k);
Now, as I understand, if n==0, then all cycles spent
in process_pkts() will be accounted as idle.
2) App doing something similar to what pdump library does
(creates a copy of a packet and sends it somewhere).
n =rte_eth_rx_burst(p0, q0, &pkt, 1);
if (n != 0) {
dup_pkt = rte_pktmbuf_copy(pkt, dup_mp, ...);
if (dup_pkt != NULL)
process_dup_pkt(dup_pkt);
process_pkt(pkt);
}
that relates to ring discussion below:
if there are no mbufs in dup_mp, then ring_deque() will fail
and process_pkt() will be accounted as idle.
3) App dequeues from ring in a bit of unusual way:
/* idle spin loop */
while ((n = rte_ring_count(ring)) == 0)
ret_pause();
n = rte_ring_dequeue_bulk(ring, pkts, n, NULL);
if (n != 0)
process_pkts(pkts, n);
here, we can end-up accounting cycles spent in
idle spin loop as busy.
4) Any thread that generates TX traffic on it's own
(something like testpmd tx_only fwd mode)
5) Any thread that depends on both dequeue and enqueue:
n = rte_ring_dequeue_burst(in_ring, pkts, n, ..);
...
/* loop till all packets are sent out successfully */
while(rte_ring_enqueue_bulk(out_ring, pkts, n, NULL) == 0)
rte_pause();
Now, if n > 0, all cycles spent in enqueue() will be accounted
as 'busy', though from my perspective they probably should
be considered as 'idle'.
Also I expect some problems when packet processing is done inside
rx callbacks, but that probably can be easily fixed.
>
>>> It's worth keeping in mind that this feature is compile-time disabled by
>>> default, so there is no impact to any application/user that does not
>>> wish to use this, for example applications where this type of busyness
>>> is not useful, or for applications that already use other mechanisms to
>>> report similar telemetry.
>> Not sure that adding in new compile-time option disabled by default is
>> a good thing...
>> For me it would be much more preferable if we'll go through a more
>> 'standard' way here:
>> a) define clear API to enable/disable/collect/report such type of stats.
>> b) use some of our sample apps to demonstrate how to use it properly
>> with user-specific code.
>> c) if needed implement some 'silent' stats collection for limited
>> scope of apps via callbacks -
>> let say for run-to-completion apps that do use ether and crypto devs
>> only.
>
> With the compile-time option, its just one build flag for lots of
> applications to silently benefit from this.
There could be a lot of useful and helpfull stats
that user would like to collect (idle/busy, processing latency, etc.).
But, if for each such case we will hard-code new stats collection
into our fast data-path code, then very soon it will become
completely bloated and unmaintainable.
I think we need some generic approach for such extra stats collection.
Callbacks could be one way, Jerin in another mail suggested using
existing trace-point hooks, might be it worth to explore it further.
>
>> However, the upside for applications that do
>>> wish to use this is that there are no code changes required (for the
>>> most part), the feature simply needs to be enabled at compile-time via
>>> the meson option.
>>>
>>> In scenarios where contextual awareness of the application is needed in
>>> order to report more accurate "busyness", the
>>> "RTE_LCORE_POLL_BUSYNESS_TIMESTAMP(n)" macro can be used to mark
>>> sections of code as "busy" or "idle". This way, the application can
>>> assume control of determining the poll busyness of its lcores while
>>> leveraging the telemetry hooks adding in this patchset.
>>>
>>> We did initially consider implementing this via callbacks, however we
>>> found this approach to have 2 main drawbacks:
>>> 1. Application changes are required for all applications wanting to
>>> report this telemetry - rather than the majority getting it for free.
>> Didn't get it - why callbacks approach would require user-app changes?
>> In other situations - rte_power callbacks, pdump, etc. it works
>> transparent to
>> user-leve code.
>> Why it can't be done here in a similar way?
>
> From my understanding, the callbacks would need to be registered by the
> application at the very least (and the callback would have to be
> registered per device/pmd/lib).
Callbacks can be registered by library itself.
AFAIK, latenc-ystats, power and pdump libraries - all use similar
approach. user calls something like xxx_stats_enable() and then library
can iterate over all available devices and setup necessary callbacks.
same for xxx_stats_disable().
>>
>>> 2. Ring does not have callback support, meaning pipelined applications
>>> could not report lcore poll busyness telemetry with this approach.
>> That's another big concern that I have:
>> Why you consider that all rings will be used for a pipilines between
>> threads and should
>> always be accounted by your stats?
>> They could be used for dozens different purposes.
>> What if that ring is used for mempool, and ring_dequeue() just means
>> we try to allocate
>> an object from the pool? In such case, why failing to allocate an
>> object should mean
>> start of new 'idle cycle'?
>
> Another approach could be taken here if the mempool interactions are of
> concern.
>
> From our understanding, mempool operations use the "_bulk" APIs,
> whereas polling operations use the "_burst" APIs. Would only
> timestamping on the "_burst" APIs be better here? That way the mempool
> interactions won't be counted towards the busyness.
Well, it would help to solve one particular case,
but in general I still think it is incomplete and error-prone.
What if pipeline app will use ring_count/ring_dequeue_bulk(),
or even ZC ring API?
What if app will use something different from rte_ring to pass
packets between threads/processes?
As I said before, without some clues from the app, it is probably
not possible to collect such stats in a proper way.
> Including support for pipelined applications using rings is key for a
> number of usecases, this was highlighted as part of the customer
> feedback when we shared the design.
>
>>
>>> Eventdev is another driver which would be completely missed with this
>>> approach.
>> Ok, I see two ways here:
>> - implement CB support for eventdev.
>> -meanwhile clearly document that this stats are not supported for
>> eventdev scenarios (yet).
next prev parent reply other threads:[~2022-10-01 14:17 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-15 13:12 [PATCH v1 1/2] eal: add lcore " Anatoly Burakov
2022-07-15 13:12 ` [PATCH v1 2/2] eal: add cpuset lcore telemetry entries Anatoly Burakov
2022-07-15 13:35 ` [PATCH v1 1/2] eal: add lcore busyness telemetry Burakov, Anatoly
2022-07-15 13:46 ` Jerin Jacob
2022-07-15 14:11 ` Bruce Richardson
2022-07-15 14:18 ` Burakov, Anatoly
2022-07-15 22:13 ` Morten Brørup
2022-07-16 14:38 ` Thomas Monjalon
2022-07-17 3:10 ` Honnappa Nagarahalli
2022-07-17 9:56 ` Morten Brørup
2022-07-18 9:43 ` Burakov, Anatoly
2022-07-18 10:59 ` Morten Brørup
2022-07-19 12:20 ` Thomas Monjalon
2022-07-18 15:46 ` Stephen Hemminger
2022-08-24 16:24 ` [PATCH v2 0/3] Add lcore poll " Kevin Laatz
2022-08-24 16:24 ` [PATCH v2 1/3] eal: add " Kevin Laatz
2022-08-24 16:24 ` [PATCH v2 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-24 16:24 ` [PATCH v2 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-08-25 7:47 ` [PATCH v2 0/3] Add lcore poll busyness telemetry Morten Brørup
2022-08-25 10:53 ` Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 " Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 1/3] eal: add " Kevin Laatz
2022-08-26 7:05 ` Jerin Jacob
2022-08-26 8:07 ` Bruce Richardson
2022-08-26 8:16 ` Jerin Jacob
2022-08-26 8:29 ` Morten Brørup
2022-08-26 15:27 ` Kevin Laatz
2022-08-26 15:46 ` Morten Brørup
2022-08-29 10:41 ` Bruce Richardson
2022-08-29 10:53 ` Thomas Monjalon
2022-08-29 12:36 ` Kevin Laatz
2022-08-29 12:49 ` Morten Brørup
2022-08-29 13:37 ` Kevin Laatz
2022-08-29 13:44 ` Morten Brørup
2022-08-29 14:21 ` Kevin Laatz
2022-08-29 11:22 ` Morten Brørup
2022-08-26 22:06 ` Mattias Rönnblom
2022-08-29 8:23 ` Bruce Richardson
2022-08-29 13:16 ` Kevin Laatz
2022-08-30 10:26 ` Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-08-25 15:28 ` [PATCH v3 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 1/3] eal: add " Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-01 14:39 ` [PATCH v4 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 0/3] Add lcore poll busyness telemetry Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 1/3] eal: add " Kevin Laatz
2022-09-03 13:33 ` Jerin Jacob
2022-09-06 9:37 ` Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 2/3] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-02 15:58 ` [PATCH v5 3/3] doc: add howto guide for lcore poll busyness Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 1/4] eal: add " Kevin Laatz
2022-09-13 13:48 ` Morten Brørup
2022-09-13 13:19 ` [PATCH v6 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-13 13:19 ` [PATCH v6 4/4] doc: add howto guide " Kevin Laatz
2022-09-14 9:29 ` [PATCH v7 0/4] Add lcore poll busyness telemetry Kevin Laatz
2022-09-14 9:29 ` [PATCH v7 1/4] eal: add " Kevin Laatz
2022-09-14 14:30 ` Stephen Hemminger
2022-09-16 12:35 ` Kevin Laatz
2022-09-19 10:19 ` Konstantin Ananyev
2022-09-22 17:14 ` Kevin Laatz
2022-09-26 9:37 ` Konstantin Ananyev
2022-09-29 12:41 ` Kevin Laatz
2022-09-30 12:32 ` Jerin Jacob
2022-10-01 14:17 ` Konstantin Ananyev [this message]
2022-10-03 20:02 ` Mattias Rönnblom
2022-10-04 9:15 ` Morten Brørup
2022-10-04 11:57 ` Bruce Richardson
2022-10-04 14:26 ` Mattias Rönnblom
2022-10-04 23:30 ` Konstantin Ananyev
2022-09-30 22:13 ` Mattias Rönnblom
2022-09-14 9:29 ` [PATCH v7 2/4] eal: add cpuset lcore telemetry entries Kevin Laatz
2022-09-14 9:29 ` [PATCH v7 3/4] app/test: add unit tests for lcore poll busyness Kevin Laatz
2022-09-30 22:20 ` Mattias Rönnblom
2022-09-14 9:29 ` [PATCH v7 4/4] doc: add howto guide " Kevin Laatz
2022-09-14 14:33 ` [PATCH v7 0/4] Add lcore poll busyness telemetry Stephen Hemminger
2022-09-16 12:35 ` Kevin Laatz
2022-09-16 14:10 ` Kevin Laatz
2022-10-05 13:44 ` Kevin Laatz
2022-10-06 13:25 ` Morten Brørup
2022-10-06 15:26 ` Mattias Rönnblom
2022-10-10 15:22 ` Morten Brørup
2022-10-10 17:38 ` Mattias Rönnblom
2022-10-12 12:25 ` Morten Brørup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a14449f3-f679-706b-ce6e-c8b2efa1794b@yandex.ru \
--to=konstantin.v.ananyev@yandex.ru \
--cc=anatoly.burakov@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=ashish.gupta@marvell.com \
--cc=bruce.richardson@intel.com \
--cc=conor.walsh@intel.com \
--cc=david.hunt@intel.com \
--cc=dev@dpdk.org \
--cc=fengchengwen@huawei.com \
--cc=ferruh.yigit@xilinx.com \
--cc=gakhil@marvell.com \
--cc=hemant.agrawal@nxp.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=jerinj@marvell.com \
--cc=kevin.laatz@intel.com \
--cc=konstantin.ananyev@huawei.com \
--cc=mdr@ashroe.eu \
--cc=nicolas.chautru@intel.com \
--cc=orika@nvidia.com \
--cc=roy.fan.zhang@intel.com \
--cc=sachin.saxena@oss.nxp.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).