patches for DPDK stable branches
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: "Thomas Monjalon" <thomas@monjalon.net>, <dev@dpdk.org>,
	"David Marchand" <david.marchand@redhat.com>, <stable@dpdk.org>,
	"Anatoly Burakov" <anatoly.burakov@intel.com>,
	"Dmitry Kozlyuk" <dmitry.kozliuk@gmail.com>,
	"Narcisa Ana Maria Vasile" <navasile@linux.microsoft.com>,
	"Dmitry Malloy" <dmitrym@microsoft.com>,
	"Pallavi Kadam" <pallavi.kadam@intel.com>,
	"Tyler Retzlaff" <roretzla@linux.microsoft.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>
Subject: RE: [PATCH v2] eal/unix: allow creating thread with real-time priority
Date: Thu, 26 Oct 2023 09:33:42 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk> (raw)
In-Reply-To: <20231025143318.3be26bb3@hermes.local>

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 25 October 2023 23.33
> 
> On Wed, 25 Oct 2023 19:54:06 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > I agree with Thomas on this.
> >
> > If you want the log message, please degrade it to INFO or DEBUG level. It is
> only relevant when chasing problems, not for normal production - and thus
> NOTICE is too high.
> 
> I don't want the message to be hidden.
> If we get any bug reports want to be able to say "read the log, don't do
> that".

Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion.

It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems).
As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log.

> 
> > Someone might build a kernel with options to keep non-dataplane threads off
> some dedicated CPU cores, so they can be used for guaranteed low-latency
> dataplane threads. We do. We don't use real-time priority, though.
> 
> This is really, hard to do.

As my kids would say: This is really, really, really, really, really hard to do!

We have not been able to find an authoritative source of documentation describing how to do it. :-(

And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency.

It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-)

> Isolated CPU's are not isolated from interrupts
> and other sources which end up scheduling work as kernel threads. Plus there
> is the behavior where kernel decides to turn a soft irq into a kernel thread,
> then starve itself.

We have configured the kernel to put all of this on CPU 0. (Details further below.)

> Under starvation, disk corruption is likely if interrupts never get
> processed :-(
> 
> > For reference, we did some experiments (using this custom built kernel) with
> a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and
> registering the delta. Although the overwhelming majority is ca. CPU 80
> cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of
> magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel
> threads steal some cycles from this thread, regardless of our customizations.
> We haven't bothered analyzing and optimizing it further.
> 
> Was this on isolated CPU?

Yes. We isolate all CPUs but CPU 0.

> Did you check that that CPU was excluded from the smp_affinty mask on all
> devices?

Not sure how to do that?

NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things?

> Did you enable the kernel feature to avoid clock ticks if CPU is dedicated?

Yes:
# Timers subsystem
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y

CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll"

> Same thing for RCU, need to adjust parameters?

Yes:
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

> 
> Also, on many systems there can be SMI BIOS hidden execution that will cause
> big outliers.

Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that.

> 
> Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of
> places.

Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time.

> 
> > I think our experiment supports the need to allow kernel threads to run,
> e.g. by calling sleep() or similar, when an EAL thread has real-time priority.


  reply	other threads:[~2023-10-26  7:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-24 12:54 [PATCH] " Thomas Monjalon
2023-10-24 13:55 ` Morten Brørup
2023-10-24 16:04   ` Stephen Hemminger
2023-10-25 13:15     ` Thomas Monjalon
2023-10-25 13:34       ` Bruce Richardson
2023-10-25 13:44         ` Thomas Monjalon
2023-10-25 15:08           ` Stephen Hemminger
2023-10-25 15:14             ` Bruce Richardson
2023-10-25 15:18               ` Thomas Monjalon
2023-10-25 15:32                 ` Thomas Monjalon
2023-10-25 15:13 ` [PATCH v2] " Thomas Monjalon
2023-10-25 15:37   ` Stephen Hemminger
2023-10-25 16:46     ` Thomas Monjalon
2023-10-25 17:54       ` Morten Brørup
2023-10-25 21:33         ` Stephen Hemminger
2023-10-26  7:33           ` Morten Brørup [this message]
2023-10-26 16:32             ` Stephen Hemminger
2023-10-26 17:07               ` Morten Brørup
2023-10-26  0:00         ` Stephen Hemminger
     [not found] ` <20231025163352.1076755-1-thomas@monjalon.net>
2023-10-25 16:31   ` [PATCH v3 2/2] " Thomas Monjalon
     [not found] ` <20231026134313.1165954-1-thomas@monjalon.net>
2023-10-26 13:37   ` [PATCH v4 " Thomas Monjalon
     [not found] ` <20231026142749.1174372-1-thomas@monjalon.net>
2023-10-26 14:19   ` [PATCH v5 " Thomas Monjalon
2023-10-27  8:08 ` [PATCH v6 1/1] " Thomas Monjalon
2023-10-27  8:45   ` Morten Brørup
2023-10-27  9:11     ` Thomas Monjalon
2023-10-27 18:15     ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=dmitrym@microsoft.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=navasile@linux.microsoft.com \
    --cc=pallavi.kadam@intel.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=stable@dpdk.org \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).