[dpdk-dev] Windows DPDK real-time priority threads causing thread starvation
Date: Wed, 9 Dec 2020 14:15:30 +0000
During our verification tests on Windows DPDK we've noticed that DPDK polling threads, which run in REALTIME_PRIORITY_CLASS are causing starvation to other threads from the OS which need to change affinity and run in lower priority.

While running an application for a while we see the OS thread waits for 2:30 minutes and raises a bugcheck, see below example of such flow:

1) DPDK thread running on core-0 in real-time high priority(24) polling mode.
2) The thread is blocking the system function NtSetSystemInformation (ExpUpdateTimerConfiguration) in another thread from 
   switching to core-0 via KeSetSystemGroupAffinityThread since the calling thread is priority 15. 
3) NtSetSystemInformation exclusively acquired system-wide lock (ExpTimeRefreshLock) hence 
    it blocks other threads (e.g. calling NtQuerySystemInformation).

We've seen this behavior only while running on Windows 2019 VMs, maybe on native machines OS scheduling of such flow is done differently? 

Below is usage explanation from the documentation of SetPriorityClass [1]:

Process that has the highest possible priority. The threads of the process preempt the threads of all other processes, including operating system processes performing important tasks. For example, a real-time process that executes for more than a very brief interval can cause disk caches not to flush or cause the mouse to be unresponsive. 

So I assume using this kind of thread for a long period as we do can cause unstable behavior.

How do you think we can resolve this? Are there such cases in Linux?

[1] - https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setpriorityclass



