DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
@ 2018-04-19 15:43 terry.montague.1980
  2018-04-19 20:01 ` terry.montague.1980
  2018-04-20  1:59 ` Tim Shearer
  0 siblings, 2 replies; 6+ messages in thread
From: terry.montague.1980 @ 2018-04-19 15:43 UTC (permalink / raw)
  To: users

Hi there,
I wondered if anyone had come across this particular problem regarding linux scheduling, or rather what appears to be a forced descheduling effect.
I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-generic. 
Local Timer interrupts are therefore enabled....
I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card attached to NUMA 0.
What I'm doing is running my DPDK receive thread on the isolated core (2) and 
changing the scheduling for this thread to SCHED_FIFO and priority 98.
Most of the time this works really well. However, I'm running this DPDK thread inside a larger application - there are probably 40 threads inside this process at default priority.
What I'm seeing is, when the application is under load, the DPDK receive thread is forcibly descheduled (observed with pidstat -p <PID> -w and seeing the non-voluntary counts spike ) and the core appears to go idle, sometimes for up to 1400uS. 
This is obviously a problem....
Running "perf" to sample activity on this isolated core only, I see the following entries.
   0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
   0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
i.e  - it has gone idle and 1.5% of the processing time has gone elsewhere - which ties in pretty well with my ~1400uS deschedule observation.
In normal operation I do not see this effect.
I've checked the code - it appears to go idle in the middle of some AVX2 data processing code - there are no system calls taken, it just goes idle.
Does anyone have any ideas ? 
Many thanks
Terry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
  2018-04-19 15:43 [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load terry.montague.1980
@ 2018-04-19 20:01 ` terry.montague.1980
  2018-04-20  1:44   ` Stephen Hemminger
  2018-04-20  1:59 ` Tim Shearer
  1 sibling, 1 reply; 6+ messages in thread
From: terry.montague.1980 @ 2018-04-19 20:01 UTC (permalink / raw)
  To: users


I should also say - I've disabled the kernel's 5% time reservation for SCHED_FIFO through setting /proc/sys/kernel/sched_rt_runtime_us to -1.

Any help with this problem gratefully appreciated.

Many thanks

Terry.


----Original message----
>From : terry.montague.1980@btinternet.com
Date : 19/04/18 - 16:43 (BST)
To : users@dpdk.org
Subject : [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load

Hi there,
I wondered if anyone had come across this particular problem regarding linux scheduling, or rather what appears to be a forced descheduling effect.
I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-generic. 
Local Timer interrupts are therefore enabled....
I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card attached to NUMA 0.
What I'm doing is running my DPDK receive thread on the isolated core (2) and 
changing the scheduling for this thread to SCHED_FIFO and priority 98.
Most of the time this works really well. However, I'm running this DPDK thread inside a larger application - there are probably 40 threads inside this process at default priority.
What I'm seeing is, when the application is under load, the DPDK receive thread is forcibly descheduled (observed with pidstat -p <PID> -w and seeing the non-voluntary counts spike ) and the core appears to go idle, sometimes for up to 1400uS. 
This is obviously a problem....
Running "perf" to sample activity on this isolated core only, I see the following entries.
   0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
   0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
i.e  - it has gone idle and 1.5% of the processing time has gone elsewhere - which ties in pretty well with my ~1400uS deschedule observation.
In normal operation I do not see this effect.
I've checked the code - it appears to go idle in the middle of some AVX2 data processing code - there are no system calls taken, it just goes idle.
Does anyone have any ideas ? 
Many thanks
Terry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
  2018-04-19 20:01 ` terry.montague.1980
@ 2018-04-20  1:44   ` Stephen Hemminger
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2018-04-20  1:44 UTC (permalink / raw)
  To: terry.montague.1980; +Cc: users

On Thu, 19 Apr 2018 21:01:15 +0100 (BST)
"terry.montague.1980@btinternet.com" <terry.montague.1980@btinternet.com> wrote:

> I should also say - I've disabled the kernel's 5% time reservation for SCHED_FIFO through setting /proc/sys/kernel/sched_rt_runtime_us to -1.
> 
> Any help with this problem gratefully appreciated.
> 
> Many thanks
> 
> Terry.
> 
> 
> ----Original message----
> From : terry.montague.1980@btinternet.com
> Date : 19/04/18 - 16:43 (BST)
> To : users@dpdk.org
> Subject : [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
> 
> Hi there,
> I wondered if anyone had come across this particular problem regarding linux scheduling, or rather what appears to be a forced descheduling effect.
> I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-generic. 
> Local Timer interrupts are therefore enabled....
> I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card attached to NUMA 0.
> What I'm doing is running my DPDK receive thread on the isolated core (2) and 
> changing the scheduling for this thread to SCHED_FIFO and priority 98.
> Most of the time this works really well. However, I'm running this DPDK thread inside a larger application - there are probably 40 threads inside this process at default priority.
> What I'm seeing is, when the application is under load, the DPDK receive thread is forcibly descheduled (observed with pidstat -p <PID> -w and seeing the non-voluntary counts spike ) and the core appears to go idle, sometimes for up to 1400uS. 
> This is obviously a problem....
> Running "perf" to sample activity on this isolated core only, I see the following entries.
>    0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
>    0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
> i.e  - it has gone idle and 1.5% of the processing time has gone elsewhere - which ties in pretty well with my ~1400uS deschedule observation.
> In normal operation I do not see this effect.
> I've checked the code - it appears to go idle in the middle of some AVX2 data processing code - there are no system calls taken, it just goes idle.
> Does anyone have any ideas ? 
> Many thanks
> Terry
> 

AVX2 has issue that it  uses more cpu power, and the CPU will sometimes go into power management (self preservation state).
At my previous employer, we experimented with AVX2 for firewall matching and discovered that under benchmark load
the overall performance was worse.

Also, unless you isolate cpu's from scheduler via kernel cmdline or offline/online with sysfs.
you can run into SCHED_FIFO processes that never yield
starving ksoftirqd.  That means of softirq ever happens on that cpu, it will never get serviced leading
to hangs etc.  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
  2018-04-19 15:43 [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load terry.montague.1980
  2018-04-19 20:01 ` terry.montague.1980
@ 2018-04-20  1:59 ` Tim Shearer
  2018-04-20 10:14   ` Richard Nutman
  1 sibling, 1 reply; 6+ messages in thread
From: Tim Shearer @ 2018-04-20  1:59 UTC (permalink / raw)
  To: users, terry.montague.1980

Hi Terry,

Without digging into this too much, it looks like the kernel is context switching out to do a clear_page call, so I wonder if one of your other threads is doing something memory related that's triggering this behaviour.

Tim
________________________________
From: users <users-bounces@dpdk.org> on behalf of terry.montague.1980@btinternet.com <terry.montague.1980@btinternet.com>
Sent: Thursday, April 19, 2018 11:43:32 AM
To: users@dpdk.org
Subject: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load

Hi there,
I wondered if anyone had come across this particular problem regarding linux scheduling, or rather what appears to be a forced descheduling effect.
I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-generic.
Local Timer interrupts are therefore enabled....
I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card attached to NUMA 0.
What I'm doing is running my DPDK receive thread on the isolated core (2) and
changing the scheduling for this thread to SCHED_FIFO and priority 98.
Most of the time this works really well. However, I'm running this DPDK thread inside a larger application - there are probably 40 threads inside this process at default priority.
What I'm seeing is, when the application is under load, the DPDK receive thread is forcibly descheduled (observed with pidstat -p <PID> -w and seeing the non-voluntary counts spike ) and the core appears to go idle, sometimes for up to 1400uS.
This is obviously a problem....
Running "perf" to sample activity on this isolated core only, I see the following entries.
   0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
   0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
i.e  - it has gone idle and 1.5% of the processing time has gone elsewhere - which ties in pretty well with my ~1400uS deschedule observation.
In normal operation I do not see this effect.
I've checked the code - it appears to go idle in the middle of some AVX2 data processing code - there are no system calls taken, it just goes idle.
Does anyone have any ideas ?
Many thanks
Terry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
  2018-04-20  1:59 ` Tim Shearer
@ 2018-04-20 10:14   ` Richard Nutman
  2018-04-20 11:32     ` terry.montague.1980
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Nutman @ 2018-04-20 10:14 UTC (permalink / raw)
  To: users, terry.montague.1980

Hi Terry,

Following on from what Stephen mentioned, when you hit an AVX2 instruction there is a warmup latency while the CPU powers on the upper half of the 256bit lanes.
It's normally around 10usecs, so possibly not accounting for everything you're seeing;

https://software.intel.com/en-us/forums/intel-isa-extensions/topic/710248

Also with RT threads that never yield you should add nosoftlockup to your bootline to prevent the kernel assuming your thread has locked up.

Some things to look into;
1. Are you using no_hz mode on the kernel bootline ?
2. Have you disabled RCU callbacks from your cpu's with rcu_nocbs on kernel bootline ?
3. Have you manually IRQbalanced to move IRQ's off your isolated cpu's ?

The clear_page_erms suggests it could be memory housekeeping like zone reclaiming or transparent_hugepages, have you disabled these ?

-Richard.

> -----Original Message-----
> From: Tim Shearer [mailto:TShearer@advaoptical.com]
> Sent: 20 April 2018 03:00
> To: users@dpdk.org; terry.montague.1980@btinternet.com
> Subject: Re: [dpdk-users] Linux forcibly descheduling isolated thread
> on isolated cpu running DPDK rx under load
> 
> Hi Terry,
> 
> Without digging into this too much, it looks like the kernel is context
> switching out to do a clear_page call, so I wonder if one of your other
> threads is doing something memory related that's triggering this
> behaviour.
> 
> Tim
> ________________________________
> From: users <users-bounces@dpdk.org> on behalf of
> terry.montague.1980@btinternet.com <terry.montague.1980@btinternet.com>
> Sent: Thursday, April 19, 2018 11:43:32 AM
> To: users@dpdk.org
> Subject: [dpdk-users] Linux forcibly descheduling isolated thread on
> isolated cpu running DPDK rx under load
> 
> Hi there,
> I wondered if anyone had come across this particular problem regarding
> linux scheduling, or rather what appears to be a forced descheduling
> effect.
> I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-
> generic.
> Local Timer interrupts are therefore enabled....
> I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first
> NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card
> attached to NUMA 0.
> What I'm doing is running my DPDK receive thread on the isolated core
> (2) and changing the scheduling for this thread to SCHED_FIFO and
> priority 98.
> Most of the time this works really well. However, I'm running this DPDK
> thread inside a larger application - there are probably 40 threads
> inside this process at default priority.
> What I'm seeing is, when the application is under load, the DPDK
> receive thread is forcibly descheduled (observed with pidstat -p <PID>
> -w and seeing the non-voluntary counts spike ) and the core appears to
> go idle, sometimes for up to 1400uS.
> This is obviously a problem....
> Running "perf" to sample activity on this isolated core only, I see the
> following entries.
>    0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
>    0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
> i.e  - it has gone idle and 1.5% of the processing time has gone
> elsewhere - which ties in pretty well with my ~1400uS deschedule
> observation.
> In normal operation I do not see this effect.
> I've checked the code - it appears to go idle in the middle of some
> AVX2 data processing code - there are no system calls taken, it just
> goes idle.
> Does anyone have any ideas ?
> Many thanks
> Terry
---------------------------------------------------------------------------------------
This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
  2018-04-20 10:14   ` Richard Nutman
@ 2018-04-20 11:32     ` terry.montague.1980
  0 siblings, 0 replies; 6+ messages in thread
From: terry.montague.1980 @ 2018-04-20 11:32 UTC (permalink / raw)
  To: Richard.Nutman, users

Hi there everyone - thank you for the responses.
As I'm using the standard Ubuntu kernel - nohz isn't an option , for nohz-full mode anyway. I can't recompile the kernel as I need to make use of a proprietary binary kernel driver. I don't need extremely low latency - 10uS as a worst case would be absolutely fine as there are 4096 buffers on the receive side of the X550 card and this is a receive process.
I've rebalanced all other IRQs away from the isolated cores. Nowatchdog, nosoftlockup are implemented. rcu callbacks are disabled for the isolated cores.
CStates > 1 are disabled. All the cores are running at 2.6GHz continually (as observed in i7z).
What is puzzling me is the 'extreme' state of idleness that the isolated core enters, from running SCHED_FIFO at pri98 to nothing - for ages. P states are disabled, speedstep is disabled - so if this is something to do with the AVX2 activity then I'm stumped at what the mechanism is, as the effect only manifests itself when the rest of the software is more active/using memory/transitioning in processing etc.
The isolated core's thread is actually going into C1 state for significant time whilst this effect occurs, as I pick this up in i7z and I see the non-voluntary context switches away from the isolated core's thread by logging activity in 'perf'.
I'm struggling with why the effect is so extreme. The processor is disappearing for up to 20,000,000 TSC clocks at 2.6GHz, which is just outrageously long -  then the core wakes up again and the thread continues. The median time for my AVX2 processing on a packet is something like 0.3uS - its not the code itself that's causing this. It appears to just go idle in the middle of AVX2.
Its a single thread within a process - which has many other threads on other cores all at default priority.
Can some process-scope effect come in and just throw the SCHED_FIFO priority thread off the core for ages? Seems unlikely.
Any help gratefully received.
Terry.
----Original message----
>From : Richard.Nutman@s-a-m.com
Date : 20/04/18 - 11:14 (BST)
To : users@dpdk.org, terry.montague.1980@btinternet.com
Subject : RE: [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load
Hi Terry,
Following on from what Stephen mentioned, when you hit an AVX2 instruction there is a warmup latency while the CPU powers on the upper half of the 256bit lanes.
It's normally around 10usecs, so possibly not accounting for everything you're seeing;
https://software.intel.com/en-us/forums/intel-isa-extensions/topic/710248
Also with RT threads that never yield you should add nosoftlockup to your bootline to prevent the kernel assuming your thread has locked up.
Some things to look into;
1. Are you using no_hz mode on the kernel bootline ?
2. Have you disabled RCU callbacks from your cpu's with rcu_nocbs on kernel bootline ?
3. Have you manually IRQbalanced to move IRQ's off your isolated cpu's ?
The clear_page_erms suggests it could be memory housekeeping like zone reclaiming or transparent_hugepages, have you disabled these ?
-Richard.
> -----Original Message-----
> From: Tim Shearer [mailto:TShearer@advaoptical.com]
> Sent: 20 April 2018 03:00
> To: users@dpdk.org; terry.montague.1980@btinternet.com
> Subject: Re: [dpdk-users] Linux forcibly descheduling isolated thread
> on isolated cpu running DPDK rx under load
> 
> Hi Terry,
> 
> Without digging into this too much, it looks like the kernel is context
> switching out to do a clear_page call, so I wonder if one of your other
> threads is doing something memory related that's triggering this
> behaviour.
> 
> Tim
> ________________________________
> From: users <users-bounces@dpdk.org> on behalf of
> terry.montague.1980@btinternet.com <terry.montague.1980@btinternet.com>
> Sent: Thursday, April 19, 2018 11:43:32 AM
> To: users@dpdk.org
> Subject: [dpdk-users] Linux forcibly descheduling isolated thread on
> isolated cpu running DPDK rx under load
> 
> Hi there,
> I wondered if anyone had come across this particular problem regarding
> linux scheduling, or rather what appears to be a forced descheduling
> effect.
> I'm running on standard vanilla Ubuntu 17-10 using kernel 4.13.0-36-
> generic.
> Local Timer interrupts are therefore enabled....
> I'm running a dual CPU Xeon E5-2623v4 system. I have cpu 2 on the first
> NUMA node (CPU 0) isolated for DPDK receive. I have an Intel X550 card
> attached to NUMA 0.
> What I'm doing is running my DPDK receive thread on the isolated core
> (2) and changing the scheduling for this thread to SCHED_FIFO and
> priority 98.
> Most of the time this works really well. However, I'm running this DPDK
> thread inside a larger application - there are probably 40 threads
> inside this process at default priority.
> What I'm seeing is, when the application is under load, the DPDK
> receive thread is forcibly descheduled (observed with pidstat -p <PID>
> -w and seeing the non-voluntary counts spike ) and the core appears to
> go idle, sometimes for up to 1400uS.
> This is obviously a problem....
> Running "perf" to sample activity on this isolated core only, I see the
> following entries.
>    0.90%  swapper        [kernel.kallsyms]    [k] cpu_idle_poll
>    0.60%  lcore-slave-2  [kernel.kallsyms]    [k] clear_page_erms
> i.e  - it has gone idle and 1.5% of the processing time has gone
> elsewhere - which ties in pretty well with my ~1400uS deschedule
> observation.
> In normal operation I do not see this effect.
> I've checked the code - it appears to go idle in the middle of some
> AVX2 data processing code - there are no system calls taken, it just
> goes idle.
> Does anyone have any ideas ?
> Many thanks
> Terry
  This email has been scanned for email related threats and delivered safely by Mimecast.
 For more information please visit http://www.mimecast.com  

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-20 11:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19 15:43 [dpdk-users] Linux forcibly descheduling isolated thread on isolated cpu running DPDK rx under load terry.montague.1980
2018-04-19 20:01 ` terry.montague.1980
2018-04-20  1:44   ` Stephen Hemminger
2018-04-20  1:59 ` Tim Shearer
2018-04-20 10:14   ` Richard Nutman
2018-04-20 11:32     ` terry.montague.1980

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).