DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Dmitry Malloy (MESHCHANINOV)" <dmitrym@microsoft.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Tal Shnaiderman <talshn@nvidia.com>
Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>,
	Narcisa Ana Maria Vasile <Narcisa.Vasile@microsoft.com>,
	Eilon Greenstein <eilong@nvidia.com>,
	Omar Cardona <ocardona@microsoft.com>,
	Rani Sharoni <ranish@nvidia.com>, Odi Assli <odia@nvidia.com>,
	Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>,
	thomas <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [EXTERNAL] Re: Windows DPDK real-time priority threads causing thread starvation
Date: Wed, 9 Dec 2020 16:12:10 +0000	[thread overview]
Message-ID: <DM5PR21MB18167C9CA7B39F83D9C6A016B3CC1@DM5PR21MB1816.namprd21.prod.outlook.com> (raw)
In-Reply-To: <20201209080858.168e4c52@hermes.local>

There is a configuration in Windows similar to Linux isolcpus where scheduler tries not to run anything on such cores, and implementation is being enhanced for the next Windows release with User-mode vmswitch feedback.

I'll dig out the details.

Dmitry

-----Original Message-----
From: Stephen Hemminger <stephen@networkplumber.org> 
Sent: Wednesday, December 9, 2020 8:09 AM
To: Tal Shnaiderman <talshn@nvidia.com>
Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>; Dmitry Malloy (MESHCHANINOV) <dmitrym@microsoft.com>; Narcisa Ana Maria Vasile <Narcisa.Vasile@microsoft.com>; Eilon Greenstein <eilong@nvidia.com>; Omar Cardona <ocardona@microsoft.com>; Rani Sharoni <ranish@nvidia.com>; Odi Assli <odia@nvidia.com>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>; thomas <thomas@monjalon.net>; dev@dpdk.org
Subject: [EXTERNAL] Re: [dpdk-dev] Windows DPDK real-time priority threads causing thread starvation

On Wed, 9 Dec 2020 14:15:30 +0000
Tal Shnaiderman <talshn@nvidia.com> wrote:

> Hi,
> 
> During our verification tests on Windows DPDK we've noticed that DPDK polling threads, which run in REALTIME_PRIORITY_CLASS are causing starvation to other threads from the OS which need to change affinity and run in lower priority.
> 
> While running an application for a while we see the OS thread waits for 2:30 minutes and raises a bugcheck, see below example of such flow:
> 
> 1) DPDK thread running on core-0 in real-time high priority(24) polling mode.
> 2) The thread is blocking the system function NtSetSystemInformation 
> (ExpUpdateTimerConfiguration) in another thread from
>    switching to core-0 via KeSetSystemGroupAffinityThread since the 
> calling thread is priority 15.
> 3) NtSetSystemInformation exclusively acquired system-wide lock 
> (ExpTimeRefreshLock) hence
>     it blocks other threads (e.g. calling NtQuerySystemInformation).
> 
> We've seen this behavior only while running on Windows 2019 VMs, maybe on native machines OS scheduling of such flow is done differently? 
> 
> Below is usage explanation from the documentation of SetPriorityClass [1]:
> 
> - REALTIME_PRIORITY_CLASS
> Process that has the highest possible priority. The threads of the process preempt the threads of all other processes, including operating system processes performing important tasks. For example, a real-time process that executes for more than a very brief interval can cause disk caches not to flush or cause the mouse to be unresponsive. 
> 
> So I assume using this kind of thread for a long period as we do can cause unstable behavior.
> 
> How do you think we can resolve this? Are there such cases in Linux?
> 
> [1] - 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> .microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Fprocessthreadsapi%2Fn
> f-processthreadsapi-setpriorityclass&amp;data=04%7C01%7Cdmitrym%40micr
> osoft.com%7C623844c12bc2440d3bbd08d89c5cc0f1%7C72f988bf86f141af91ab2d7
> cd011db47%7C1%7C0%7C637431269479649074%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp
> ;sdata=LSv%2F%2BZMxUkfgcQVD778wL4JbVdl2qYV1tHdfVrEck4c%3D&amp;reserved
> =0
> 
> Thanks,
> 
> Tal.

This is not unique to Windows, Linux has same thing when using SCHED_FIFO.
Setting REALTIME is not a magic "go fast" flag it tells scheduler to "run this thread at higher priority than kernel".  Setting real time is not compatible with applications doing 100% polling. 

If you have to use REALTIME then application must change to doing sleep/wakeup type architecture, not pure polling.

Typical DPDK style application is incompatible with SCHED_FIFO/SCHED_RR.

  reply	other threads:[~2020-12-10 16:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 14:15 [dpdk-dev] " Tal Shnaiderman
2020-12-09 16:08 ` John Alexander
2020-12-09 16:08 ` Stephen Hemminger
2020-12-09 16:12   ` Dmitry Malloy (MESHCHANINOV) [this message]
2020-12-16 14:53     ` [dpdk-dev] [EXTERNAL] " Tal Shnaiderman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR21MB18167C9CA7B39F83D9C6A016B3CC1@DM5PR21MB1816.namprd21.prod.outlook.com \
    --to=dmitrym@microsoft.com \
    --cc=Harini.Ramakrishnan@microsoft.com \
    --cc=Narcisa.Vasile@microsoft.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=eilong@nvidia.com \
    --cc=ocardona@microsoft.com \
    --cc=odia@nvidia.com \
    --cc=ranish@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=talshn@nvidia.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).