From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: "Thomas Monjalon" <thomas@monjalon.net>, <dev@dpdk.org>,
"David Marchand" <david.marchand@redhat.com>, <stable@dpdk.org>,
"Anatoly Burakov" <anatoly.burakov@intel.com>,
"Dmitry Kozlyuk" <dmitry.kozliuk@gmail.com>,
"Narcisa Ana Maria Vasile" <navasile@linux.microsoft.com>,
"Dmitry Malloy" <dmitrym@microsoft.com>,
"Pallavi Kadam" <pallavi.kadam@intel.com>,
"Tyler Retzlaff" <roretzla@linux.microsoft.com>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>
Subject: RE: [PATCH v2] eal/unix: allow creating thread with real-time priority
Date: Thu, 26 Oct 2023 09:33:42 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk> (raw)
In-Reply-To: <20231025143318.3be26bb3@hermes.local>
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 25 October 2023 23.33
>
> On Wed, 25 Oct 2023 19:54:06 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > I agree with Thomas on this.
> >
> > If you want the log message, please degrade it to INFO or DEBUG level. It is
> only relevant when chasing problems, not for normal production - and thus
> NOTICE is too high.
>
> I don't want the message to be hidden.
> If we get any bug reports want to be able to say "read the log, don't do
> that".
Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.
It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.
>
> > Someone might build a kernel with options to keep non-dataplane threads off
> some dedicated CPU cores, so they can be used for guaranteed low-latency
> dataplane threads. We do. We don't use real-time priority, though.
>
> This is really, hard to do.
As my kids would say: This is really, really, really, really, really hard to do!
We have not been able to find an authoritative source of documentation describing how to do it. :-(
And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.
It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)
> Isolated CPU's are not isolated from interrupts
> and other sources which end up scheduling work as kernel threads. Plus there
> is the behavior where kernel decides to turn a soft irq into a kernel thread,
> then starve itself.
We have configured the kernel to put all of this on CPU 0. (Details further below.)
> Under starvation, disk corruption is likely if interrupts never get
> processed :-(
>
> > For reference, we did some experiments (using this custom built kernel) with
> a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> registering the delta. Although the overwhelming majority is ca. CPU 80
> cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> threads steal some cycles from this thread, regardless of our customizations.
> We haven't bothered analyzing and optimizing it further.
>
> Was this on isolated CPU?
Yes. We isolate all CPUs but CPU 0.
> Did you check that that CPU was excluded from the smp_affinty mask on all
> devices?
Not sure how to do that?
NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?
> Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?
Yes:
# Timers subsystem
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"
> Same thing for RCU, need to adjust parameters?
Yes:
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
>
> Also, on many systems there can be SMI BIOS hidden execution that will cause
> big outliers.
Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.
>
> Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> places.
Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.
>
> > I think our experiment supports the need to allow kernel threads to run,
> e.g. by calling sleep() or similar, when an EAL thread has real-time priority.
next prev parent reply other threads:[~2023-10-26 7:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-24 12:54 [PATCH] " Thomas Monjalon
2023-10-24 13:55 ` Morten Brørup
2023-10-24 16:04 ` Stephen Hemminger
2023-10-25 13:15 ` Thomas Monjalon
2023-10-25 13:34 ` Bruce Richardson
2023-10-25 13:44 ` Thomas Monjalon
2023-10-25 15:08 ` Stephen Hemminger
2023-10-25 15:14 ` Bruce Richardson
2023-10-25 15:18 ` Thomas Monjalon
2023-10-25 15:32 ` Thomas Monjalon
2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
2023-10-25 15:37 ` Stephen Hemminger
2023-10-25 16:46 ` Thomas Monjalon
2023-10-25 17:54 ` Morten Brørup
2023-10-25 21:33 ` Stephen Hemminger
2023-10-26 7:33 ` Morten Brørup [this message]
2023-10-26 16:32 ` Stephen Hemminger
2023-10-26 17:07 ` Morten Brørup
2023-10-26 0:00 ` Stephen Hemminger
[not found] ` <20231025163352.1076755-1-thomas@monjalon.net>
2023-10-25 16:31 ` [PATCH v3 2/2] " Thomas Monjalon
[not found] ` <20231026134313.1165954-1-thomas@monjalon.net>
2023-10-26 13:37 ` [PATCH v4 " Thomas Monjalon
[not found] ` <20231026142749.1174372-1-thomas@monjalon.net>
2023-10-26 14:19 ` [PATCH v5 " Thomas Monjalon
2023-10-27 8:08 ` [PATCH v6 1/1] " Thomas Monjalon
2023-10-27 8:45 ` Morten Brørup
2023-10-27 9:11 ` Thomas Monjalon
2023-10-27 18:15 ` Stephen Hemminger
2024-10-07 19:27 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=anatoly.burakov@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=dmitry.kozliuk@gmail.com \
--cc=dmitrym@microsoft.com \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=navasile@linux.microsoft.com \
--cc=pallavi.kadam@intel.com \
--cc=roretzla@linux.microsoft.com \
--cc=stable@dpdk.org \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).