From: "terry.montague.1980@btinternet.com" <terry.montague.1980@btinternet.com>
To: Richard.Nutman@s-a-m.com, users@dpdk.org
Subject: Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
Date: Fri, 20 Apr 2018 12:32:36 +0100 (BST) [thread overview]
Message-ID: <1228190.16734.1524223956865.JavaMail.defaultUser@defaultHost> (raw)
In-Reply-To: <733AB18813E3864094592CC5191B172A235BD476@EX-UKHA-01.ad.s-a-m.com>
Hi there everyone - thank you for the responses.
As I'm using the standard Ubuntu kernel - nohz isn't an option , for nohz-full mode anyway. I can't recompile the kernel as I need to make use of a proprietary binary kernel driver. I don't need extremely low latency - 10uS as a worst case would be absolutely fine as there are 4096 buffers on the receive side of the X550 card and this is a receive process.
I've rebalanced all other IRQs away from the isolated cores. Nowatchdog, nosoftlockup are implemented. rcu callbacks are disabled for the isolated cores.
CStates > 1 are disabled. All the cores are running at 2.6GHz continually (as observed in i7z).
What is puzzling me is the 'extreme' state of idleness that the isolated core enters, from running SCHED_FIFO at pri98 to nothing - for ages. P states are disabled, speedstep is disabled - so if this is something to do with the AVX2 activity then I'm stumped at what the mechanism is, as the effect only manifests itself when the rest of the software is more active/using memory/transitioning in processing etc.
The isolated core's thread is actually going into C1 state for significant time whilst this effect occurs, as I pick this up in i7z and I see the non-voluntary context switches away from the isolated core's thread by logging activity in 'perf'.
I'm struggling with why the effect is so extreme. The processor is disappearing for up to 20,000,000 TSC clocks at 2.6GHz, which is just outrageously long - then the core wakes up again and the thread continues. The median time for my AVX2 processing on a packet is something like 0.3uS - its not the code itself that's causing this. It appears to just go idle in the middle of AVX2.
Its a single thread within a process - which has many other threads on other cores all at default priority.
Can some process-scope effect come in and just throw the SCHED_FIFO priority thread off the core for ages? Seems unlikely.
Any help gratefully received.
Terry.
----Original message----
>From : Richard.Nutman@s-a-m.com
Date : 20/04/18 - 11:14 (BST)
To : users@dpdk.org, terry.montague.1980@btinternet.com
Subject : RE: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
Hi Terry,
Following on from what Stephen mentioned, when you hit an AVX2 instruction there is a warmup latency while the CPU powers on the upper half of the 256bit lanes.
It's normally around 10usecs, so possibly not accounting for everything you're seeing;
https://software.intel.com/en-us/forums/intel-isa-extensions/topic/710248
Also with RT threads that never yield you should add nosoftlockup to your bootline to prevent the kernel assuming your thread has locked up.
Some things to look into;
1. Are you using no_hz mode on the kernel bootline ?
2. Have you disabled RCU callbacks from your cpu's with rcu_nocbs on kernel bootline ?
3. Have you manually IRQbalanced to move IRQ's off your isolated cpu's ?
The clear_page_erms suggests it could be memory housekeeping like zone reclaiming or transparent_hugepages, have you disabled these ?
-Richard.
> -----Original Message-----
> From: Tim Shearer [mailto:TShearer@advaoptical.com]
> Sent: 20 April 2018 03:00
> To: users@dpdk.org; terry.montague.1980@btinternet.com
> Subject: Re: [dpdk-users] Linux forcibly descheduling isolated thread
> on isolated cpu running DPDK rx under load
>
> Hi Terry,
>
> Without digging into this too much, it looks like the kernel is context
> switching out to do a clear_page call, so I wonder if one of your other
> threads is doing something memory related that's triggering this
> behaviour.
>
> Tim
> ________________________________
> From: users <users-bounces@dpdk.org> on behalf of
> terry.montague.1980@btinternet.com <terry.montague.1980@btinternet.com>
> Sent: Thursday, April 19, 2018 11:43:32 AM
> To: users@dpdk.org
> Subject: [dpdk-users] Linux forcibly descheduling isolated thread on
> isolated cpu running DPDK rx under load
>
> Hi there,
> I wondered if anyone had come across this particular problem regarding
> linux scheduling, or rather what appears to be a forced descheduling
> effect.
> I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-
> generic.
> Local Timer interrupts are therefore enabled....
> I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first
> NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card
> attached to NUMA 0.
> What I'm doing is running my DPDK receive thread on the isolated core
> (2) and changing the scheduling for this thread to SCHED_FIFO and
> priority 98.
> Most of the time this works really well. However, I'm running this DPDK
> thread inside a larger application - there are probably 40 threads
> inside this process at default priority.
> What I'm seeing is, when the application is under load, the DPDK
> receive thread is forcibly descheduled (observed with pidstat -p <PID>
> -w and seeing the non-voluntary counts spike ) and the core appears to
> go idle, sometimes for up to 1400uS.
> This is obviously a problem....
> Running "perf" to sample activity on this isolated core only, I see the
> following entries.
> 0.90% swapper [kernel.kallsyms] [k] cpu_idle_poll
> 0.60% lcore-slave-2 [kernel.kallsyms] [k] clear_page_erms
> i.e - it has gone idle and 1.5% of the processing time has gone
> elsewhere - which ties in pretty well with my ~1400uS deschedule
> observation.
> In normal operation I do not see this effect.
> I've checked the code - it appears to go idle in the middle of some
> AVX2 data processing code - there are no system calls taken, it just
> goes idle.
> Does anyone have any ideas ?
> Many thanks
> Terry
This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com
prev parent reply other threads:[~2018-04-20 11:32 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-19 15:43 terry.montague.1980
2018-04-19 20:01 ` terry.montague.1980
2018-04-20 1:44 ` Stephen Hemminger
2018-04-20 1:59 ` Tim Shearer
2018-04-20 10:14 ` Richard Nutman
2018-04-20 11:32 ` terry.montague.1980 [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1228190.16734.1524223956865.JavaMail.defaultUser@defaultHost \
--to=terry.montague.1980@btinternet.com \
--cc=Richard.Nutman@s-a-m.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).