DPDK patches and discussions
 help / color / mirror / Atom feed
* Thread priority for background tasks
@ 2025-04-30 18:40 Morten Brørup
  2025-04-30 19:45 ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Morten Brørup @ 2025-04-30 18:40 UTC (permalink / raw)
  To: dev

There are only two thread priorities in the enum rte_thread_priority: Normal and Real-time Critical.

I would like to poll ethdev counters, collect garbage and perform other jitter non-sensitive tasks in a control thread with lower priority than my ordinary control threads, so it will be preempted by any work ready for my ordinary control threads.

Which DPDK API am I supposed to use to assign this below-normal priority to my "background" control thread?

Or: Aren't we missing a priority like Linux' SCHED_BATCH?


Med venlig hilsen / Kind regards,
-Morten Brørup


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thread priority for background tasks
  2025-04-30 18:40 Thread priority for background tasks Morten Brørup
@ 2025-04-30 19:45 ` Stephen Hemminger
  2025-05-01  7:05   ` Morten Brørup
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2025-04-30 19:45 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Wed, 30 Apr 2025 20:40:52 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> There are only two thread priorities in the enum rte_thread_priority: Normal and Real-time Critical.
> 
> I would like to poll ethdev counters, collect garbage and perform other jitter non-sensitive tasks in a control thread with lower priority than my ordinary control threads, so it will be preempted by any work ready for my ordinary control threads.
> 
> Which DPDK API am I supposed to use to assign this below-normal priority to my "background" control thread?
> 
> Or: Aren't we missing a priority like Linux' SCHED_BATCH?
> 
> 
> Med venlig hilsen / Kind regards,
> -Morten Brørup
> 

Short answer: if your application is running on Linux, only ever use Normal.
DPDK applications usually never sleep and this will starve the OS and cause instability.

Long answer:
Realtime critical was added for Windows, which needs it and doesn't do the
same real RT behavior. Might work on FreeBSD, not sure how the scheduler works.

On Linux, it is possible to use but only if the application is not a 100% polling.
(i.e a real time application not a polling application).
It needs to regularly sleep to allow non-RT kernel threads to run, otherwise things
like disk writes that are pending on that CPU may not happen, leading to filesystem
corruption. Or hangs.

Bottom line:
Real time critical means this needs to run for a short time now (i.e safety switch).
RT does not mean this is more important overall, or needs to run faster.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Thread priority for background tasks
  2025-04-30 19:45 ` Stephen Hemminger
@ 2025-05-01  7:05   ` Morten Brørup
  2025-05-01  7:47     ` Bruce Richardson
  0 siblings, 1 reply; 5+ messages in thread
From: Morten Brørup @ 2025-05-01  7:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 30 April 2025 21.45
> 
> On Wed, 30 Apr 2025 20:40:52 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > There are only two thread priorities in the enum rte_thread_priority:
> Normal and Real-time Critical.
> >
> > I would like to poll ethdev counters, collect garbage and perform
> other jitter non-sensitive tasks in a control thread with lower
> priority than my ordinary control threads, so it will be preempted by
> any work ready for my ordinary control threads.
> >
> > Which DPDK API am I supposed to use to assign this below-normal
> priority to my "background" control thread?
> >
> > Or: Aren't we missing a priority like Linux' SCHED_BATCH?
> 
> Short answer: if your application is running on Linux, only ever use
> Normal.
> DPDK applications usually never sleep and this will starve the OS and
> cause instability.

I was asking for the opposite of Critical priority.

For the sake of discussion, imagine a (registered or unregistered) non-EAL thread doing something like this:
loop {
	poll_counters(); // 1 ms execution time
	sleep(99 ms);
}

With normal scheduling priority, it will rack up a lot of scheduling credits during sleep(), so it might not be preempted by other threads while executing poll_counters().

But if some other thread (on the same CPU core) changes state from Sleeping to Runnable, I want it to preempt the counter polling thread.
This other thread could be a control plane application, e.g. a DNS Server, which shouldn't suffer up to 1 ms scheduling lag if it becomes Runnable the instant the counter polling thread started executing poll_counters().

So I'm looking for a DPDK API to apply a "low priority" scheduling policy, like SCHED_BATCH, to the counter polling thread.

> 
> Long answer:
> Realtime critical was added for Windows, which needs it and doesn't do
> the
> same real RT behavior. Might work on FreeBSD, not sure how the
> scheduler works.
> 
> On Linux, it is possible to use but only if the application is not a
> 100% polling.
> (i.e a real time application not a polling application).
> It needs to regularly sleep to allow non-RT kernel threads to run,
> otherwise things
> like disk writes that are pending on that CPU may not happen, leading
> to filesystem
> corruption. Or hangs.
> 
> Bottom line:
> Real time critical means this needs to run for a short time now (i.e
> safety switch).
> RT does not mean this is more important overall, or needs to run
> faster.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thread priority for background tasks
  2025-05-01  7:05   ` Morten Brørup
@ 2025-05-01  7:47     ` Bruce Richardson
  2025-05-01  8:35       ` Morten Brørup
  0 siblings, 1 reply; 5+ messages in thread
From: Bruce Richardson @ 2025-05-01  7:47 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Stephen Hemminger, dev

On Thu, May 01, 2025 at 09:05:32AM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 30 April 2025 21.45
> > 
> > On Wed, 30 Apr 2025 20:40:52 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> > 
> > > There are only two thread priorities in the enum rte_thread_priority:
> > Normal and Real-time Critical.
> > >
> > > I would like to poll ethdev counters, collect garbage and perform
> > other jitter non-sensitive tasks in a control thread with lower
> > priority than my ordinary control threads, so it will be preempted by
> > any work ready for my ordinary control threads.
> > >
> > > Which DPDK API am I supposed to use to assign this below-normal
> > priority to my "background" control thread?
> > >
> > > Or: Aren't we missing a priority like Linux' SCHED_BATCH?
> > 
> > Short answer: if your application is running on Linux, only ever use
> > Normal.
> > DPDK applications usually never sleep and this will starve the OS and
> > cause instability.
> 
> I was asking for the opposite of Critical priority.
> 
> For the sake of discussion, imagine a (registered or unregistered) non-EAL thread doing something like this:
> loop {
> 	poll_counters(); // 1 ms execution time
> 	sleep(99 ms);
> }
> 
> With normal scheduling priority, it will rack up a lot of scheduling credits during sleep(), so it might not be preempted by other threads while executing poll_counters().
> 
> But if some other thread (on the same CPU core) changes state from Sleeping to Runnable, I want it to preempt the counter polling thread.
> This other thread could be a control plane application, e.g. a DNS Server, which shouldn't suffer up to 1 ms scheduling lag if it becomes Runnable the instant the counter polling thread started executing poll_counters().
> 
> So I'm looking for a DPDK API to apply a "low priority" scheduling policy, like SCHED_BATCH, to the counter polling thread.
> 

Does this need to be done in DPDK? Unless you need to target Windows, would
using the standard Unix/Posix scheduling/pthread APIs directly not be best,
rather than having us try to wrap all such things inside DPDK APIs? I worry
about scope creep for such things, with us ending up wrapping a whole bunch
of scheduling stuff into DPDK that we should not need to do.

/Bruce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Thread priority for background tasks
  2025-05-01  7:47     ` Bruce Richardson
@ 2025-05-01  8:35       ` Morten Brørup
  0 siblings, 0 replies; 5+ messages in thread
From: Morten Brørup @ 2025-05-01  8:35 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Stephen Hemminger, dev, Harry van Haaren

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 1 May 2025 09.48
> 
> On Thu, May 01, 2025 at 09:05:32AM +0200, Morten Brørup wrote:
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Wednesday, 30 April 2025 21.45
> > >
> > > On Wed, 30 Apr 2025 20:40:52 +0200
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > > There are only two thread priorities in the enum
> rte_thread_priority:
> > > Normal and Real-time Critical.
> > > >
> > > > I would like to poll ethdev counters, collect garbage and perform
> > > other jitter non-sensitive tasks in a control thread with lower
> > > priority than my ordinary control threads, so it will be preempted
> by
> > > any work ready for my ordinary control threads.
> > > >
> > > > Which DPDK API am I supposed to use to assign this below-normal
> > > priority to my "background" control thread?
> > > >
> > > > Or: Aren't we missing a priority like Linux' SCHED_BATCH?
> > >
> > > Short answer: if your application is running on Linux, only ever
> use
> > > Normal.
> > > DPDK applications usually never sleep and this will starve the OS
> and
> > > cause instability.
> >
> > I was asking for the opposite of Critical priority.
> >
> > For the sake of discussion, imagine a (registered or unregistered)
> non-EAL thread doing something like this:
> > loop {
> > 	poll_counters(); // 1 ms execution time
> > 	sleep(99 ms);
> > }
> >
> > With normal scheduling priority, it will rack up a lot of scheduling
> credits during sleep(), so it might not be preempted by other threads
> while executing poll_counters().
> >
> > But if some other thread (on the same CPU core) changes state from
> Sleeping to Runnable, I want it to preempt the counter polling thread.
> > This other thread could be a control plane application, e.g. a DNS
> Server, which shouldn't suffer up to 1 ms scheduling lag if it becomes
> Runnable the instant the counter polling thread started executing
> poll_counters().
> >
> > So I'm looking for a DPDK API to apply a "low priority" scheduling
> policy, like SCHED_BATCH, to the counter polling thread.
> >
> 
> Does this need to be done in DPDK?

No, not really.

> Unless you need to target Windows, would
> using the standard Unix/Posix scheduling/pthread APIs directly not be
> best,
> rather than having us try to wrap all such things inside DPDK APIs?

It probably would. That's how we do it today, anyway. :-)

> I worry
> about scope creep for such things, with us ending up wrapping a whole
> bunch
> of scheduling stuff into DPDK that we should not need to do.

I'm mainly asking for academic reasons.

I think the scope of my question was included into DPDK when it introduced the rte_thread_priority with RTE_THREAD_PRIORITY_NORMAL and RTE_THREAD_PRIORITY_REALTIME_CRITICAL.

I know this is mainly related to the control plane, and thus not the most relevant thing for DPDK.
But I think we need to offer something. Not only for applications, but drivers might want to run separate low-priority threads for background tasks, such as garbage collection, counter polling, or a link state machine.

IMHO, the kernel scheduler is a much better choice than DPDK's non-preemptive "Service Cores" scheduler for many purposes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-01  8:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-30 18:40 Thread priority for background tasks Morten Brørup
2025-04-30 19:45 ` Stephen Hemminger
2025-05-01  7:05   ` Morten Brørup
2025-05-01  7:47     ` Bruce Richardson
2025-05-01  8:35       ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).