* rte_control event API? @ 2025-05-01 15:06 Stephen Hemminger 2025-05-02 8:56 ` Morten Brørup 0 siblings, 1 reply; 4+ messages in thread From: Stephen Hemminger @ 2025-05-01 15:06 UTC (permalink / raw) To: techboard; +Cc: dev There was recent discussions about drivers creating control threads. The current drivers that use rte_thread_create_internal_control keeps growing, but it got me looking at if this could be done better. Rather than having multiple control threads which have potential conflicts, why not add a new API that has one control thread and uses epoll. The current multi-process control thread could use epoll as well. Epoll scales much better and avoids any possibility of lock scheduling/priority problems. Some ideas: - single control thread started (where the current MP thread is started) - have control_register_fd and control_unregister_fd - leave rte_control_thread API for legacy uses Model this after well used libevent library https://libevent.org Open questions: - names are hard, using event as name leads to possible confusion with eventdev - do we need to support: - multiple control threads doing epoll? - priorities - timers? - signals? - manual activation? - one off events? - could alarm thread just be a control event - should also have stats and info calls - it would be good to NOT support as many features as libevent, since so many options leads to bugs. ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: rte_control event API? 2025-05-01 15:06 rte_control event API? Stephen Hemminger @ 2025-05-02 8:56 ` Morten Brørup 2025-05-02 9:08 ` Bruce Richardson 0 siblings, 1 reply; 4+ messages in thread From: Morten Brørup @ 2025-05-02 8:56 UTC (permalink / raw) To: Stephen Hemminger, techboard; +Cc: dev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, 1 May 2025 17.07 > > There was recent discussions about drivers creating control threads. > The current drivers that use rte_thread_create_internal_control keeps > growing, > but it got me looking at if this could be done better. > > Rather than having multiple control threads which have potential > conflicts, why not > add a new API that has one control thread and uses epoll. The current > multi-process > control thread could use epoll as well. Epoll scales much better and > avoids > any possibility of lock scheduling/priority problems. > > Some ideas: > - single control thread started (where the current MP thread is > started) > - have control_register_fd and control_unregister_fd > - leave rte_control_thread API for legacy uses > > Model this after well used libevent library https://libevent.org > > Open questions: > - names are hard, using event as name leads to possible confusion > with eventdev > - do we need to support: > - multiple control threads doing epoll? > - priorities > - timers? > - signals? > - manual activation? > - one off events? > - could alarm thread just be a control event > > - should also have stats and info calls > > - it would be good to NOT support as many features as libevent, > since > so many options leads to bugs. I think we need both: 1. Multi threading. Multiple control threads are required for preemptive scheduling between latency sensitive tasks and long-running tasks (that violate the latency requirements of the former). For improved support of multi threading between driver control threads and other threads (DPDK control threads and other, non-DPDK, processes on the same host), we should expand the current control thread APIs, e.g. by expanding the DPDK threads API with more than the current two priorities ("Normal" and "Real-Time Critical"). E.g. if polling ethdev counters takes 1 ms, I don't want to add 1 ms jitter to my other control plane tasks, because they all have to share one control thread only. I want the O/S scheduler to handle that for me. And yes, it means that I need to consider locking, critical sections, and all those potential problems coming with multithreading. 2. Event passing. Some threads rely on using epoll as dispatcher, some threads use different designs. Dataplane threads normally use polling (or eventdev, or Service Cores, or ...), i.e. non-preemptive scheduling of tiny processing tasks, but may switch to epoll for power saving during low traffic. In low traffic periods, drivers may raise an RX interrupt to wake up a sleeping application to start polling. DPDK currently uses an epoll based design for passing this "wakeup" event (and other events, e.g. "link status change"). (Disclaimer: Decades have passed since I wrote Windows applications, using the Win32 API, so the following might be complete nonsense...) If the "epoll" design pattern is not popular on Windows, we should not force it upon Windows developers. We should instead offer something compatible with the Windows "message pump" standard design pattern. I think it would better to adapt some DPDK APIs to the host O/S than forcing the APIs of one O/S onto another O/S, if it doesn't fit. Here's an idea related to "epoll": We could expose DPDK's internal file descriptors for the application developer to use her own preferred epoll library, e.g. libevent. Rather this than requiring using some crippled DPDK epoll library. At high level... The application developer should be free to use any design pattern preferred. We should not require using epoll as the application's main dispatcher, and thereby prevent application developers from using other design patterns. Remember: DPDK is only a library (with a lot of features). It is not a complete framework requiring a specific application design. Let's keep it that way. PS: I strongly prefer "epoll" events over "signals" for passing events to the application. Thanks to whoever made that decision. ;-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rte_control event API? 2025-05-02 8:56 ` Morten Brørup @ 2025-05-02 9:08 ` Bruce Richardson 2025-05-02 10:36 ` Morten Brørup 0 siblings, 1 reply; 4+ messages in thread From: Bruce Richardson @ 2025-05-02 9:08 UTC (permalink / raw) To: Morten Brørup; +Cc: Stephen Hemminger, techboard, dev On Fri, May 02, 2025 at 10:56:58AM +0200, Morten Brørup wrote: > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Thursday, 1 May 2025 17.07 > > > > There was recent discussions about drivers creating control threads. > > The current drivers that use rte_thread_create_internal_control keeps > > growing, > > but it got me looking at if this could be done better. > > > > Rather than having multiple control threads which have potential > > conflicts, why not > > add a new API that has one control thread and uses epoll. The current > > multi-process > > control thread could use epoll as well. Epoll scales much better and > > avoids > > any possibility of lock scheduling/priority problems. > > > > Some ideas: > > - single control thread started (where the current MP thread is > > started) > > - have control_register_fd and control_unregister_fd > > - leave rte_control_thread API for legacy uses > > > > Model this after well used libevent library https://libevent.org > > > > Open questions: > > - names are hard, using event as name leads to possible confusion > > with eventdev > > - do we need to support: > > - multiple control threads doing epoll? > > - priorities > > - timers? > > - signals? > > - manual activation? > > - one off events? > > - could alarm thread just be a control event > > > > - should also have stats and info calls > > > > - it would be good to NOT support as many features as libevent, > > since > > so many options leads to bugs. > > I think we need both: > > 1. Multi threading. > Multiple control threads are required for preemptive scheduling between latency sensitive tasks and long-running tasks (that violate the latency requirements of the former). > For improved support of multi threading between driver control threads and other threads (DPDK control threads and other, non-DPDK, processes on the same host), we should expand the current control thread APIs, e.g. by expanding the DPDK threads API with more than the current two priorities ("Normal" and "Real-Time Critical"). > E.g. if polling ethdev counters takes 1 ms, I don't want to add 1 ms jitter to my other control plane tasks, because they all have to share one control thread only. > I want the O/S scheduler to handle that for me. And yes, it means that I need to consider locking, critical sections, and all those potential problems coming with multithreading. > > 2. Event passing. > Some threads rely on using epoll as dispatcher, some threads use different designs. > Dataplane threads normally use polling (or eventdev, or Service Cores, or ...), i.e. non-preemptive scheduling of tiny processing tasks, but may switch to epoll for power saving during low traffic. > In low traffic periods, drivers may raise an RX interrupt to wake up a sleeping application to start polling. DPDK currently uses an epoll based design for passing this "wakeup" event (and other events, e.g. "link status change"). > > (Disclaimer: Decades have passed since I wrote Windows applications, using the Win32 API, so the following might be complete nonsense...) > If the "epoll" design pattern is not popular on Windows, we should not force it upon Windows developers. We should instead offer something compatible with the Windows "message pump" standard design pattern. > I think it would better to adapt some DPDK APIs to the host O/S than forcing the APIs of one O/S onto another O/S, if it doesn't fit. > > Here's an idea related to "epoll": We could expose DPDK's internal file descriptors for the application developer to use her own preferred epoll library, e.g. libevent. Rather this than requiring using some crippled DPDK epoll library. > +1 for this suggestion. Let's just provide the low-level info needed to allow the app to work its own solution. /Bruce ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: rte_control event API? 2025-05-02 9:08 ` Bruce Richardson @ 2025-05-02 10:36 ` Morten Brørup 0 siblings, 0 replies; 4+ messages in thread From: Morten Brørup @ 2025-05-02 10:36 UTC (permalink / raw) To: Bruce Richardson; +Cc: Stephen Hemminger, techboard, dev > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Friday, 2 May 2025 11.09 > > On Fri, May 02, 2025 at 10:56:58AM +0200, Morten Brørup wrote: > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > > Sent: Thursday, 1 May 2025 17.07 > > > > > > There was recent discussions about drivers creating control > threads. > > > The current drivers that use rte_thread_create_internal_control > keeps > > > growing, > > > but it got me looking at if this could be done better. > > > > > > Rather than having multiple control threads which have potential > > > conflicts, why not > > > add a new API that has one control thread and uses epoll. The > current > > > multi-process > > > control thread could use epoll as well. Epoll scales much better > and > > > avoids > > > any possibility of lock scheduling/priority problems. > > > > > > Some ideas: > > > - single control thread started (where the current MP thread is > > > started) > > > - have control_register_fd and control_unregister_fd > > > - leave rte_control_thread API for legacy uses > > > > > > Model this after well used libevent library https://libevent.org > > > > > > Open questions: > > > - names are hard, using event as name leads to possible > confusion > > > with eventdev > > > - do we need to support: > > > - multiple control threads doing epoll? > > > - priorities > > > - timers? > > > - signals? > > > - manual activation? > > > - one off events? > > > - could alarm thread just be a control event > > > > > > - should also have stats and info calls > > > > > > - it would be good to NOT support as many features as libevent, > > > since > > > so many options leads to bugs. > > > > I think we need both: > > > > 1. Multi threading. > > Multiple control threads are required for preemptive scheduling > between latency sensitive tasks and long-running tasks (that violate > the latency requirements of the former). > > For improved support of multi threading between driver control > threads and other threads (DPDK control threads and other, non-DPDK, > processes on the same host), we should expand the current control > thread APIs, e.g. by expanding the DPDK threads API with more than the > current two priorities ("Normal" and "Real-Time Critical"). > > E.g. if polling ethdev counters takes 1 ms, I don't want to add 1 ms > jitter to my other control plane tasks, because they all have to share > one control thread only. > > I want the O/S scheduler to handle that for me. And yes, it means > that I need to consider locking, critical sections, and all those > potential problems coming with multithreading. > > > > 2. Event passing. > > Some threads rely on using epoll as dispatcher, some threads use > different designs. > > Dataplane threads normally use polling (or eventdev, or Service > Cores, or ...), i.e. non-preemptive scheduling of tiny processing > tasks, but may switch to epoll for power saving during low traffic. > > In low traffic periods, drivers may raise an RX interrupt to wake up > a sleeping application to start polling. DPDK currently uses an epoll > based design for passing this "wakeup" event (and other events, e.g. > "link status change"). > > > > (Disclaimer: Decades have passed since I wrote Windows applications, > using the Win32 API, so the following might be complete nonsense...) > > If the "epoll" design pattern is not popular on Windows, we should > not force it upon Windows developers. We should instead offer something > compatible with the Windows "message pump" standard design pattern. > > I think it would better to adapt some DPDK APIs to the host O/S than > forcing the APIs of one O/S onto another O/S, if it doesn't fit. > > > > Here's an idea related to "epoll": We could expose DPDK's internal > file descriptors for the application developer to use her own preferred > epoll library, e.g. libevent. Rather this than requiring using some > crippled DPDK epoll library. > > > +1 for this suggestion. Let's just provide the low-level info needed to > allow the app to work its own solution. For threading and CPU affinity, DPDK provides APIs to wrap O/S differences and hide the underlying (O/S specific) thread id. If we insist on hiding the underlying thread id, we need to expand these thread management APIs to support more features required by application developers, including thread prioritization. Alternatively - expanding on the idea of exposing internal file descriptors for epoll - we could expose a few O/S specific APIs for getting the underlying thread id, thereby giving the application developer the flexibility to manage thread priority, CPU affinity etc. using their preferred thread management library. E.g.: /lib/eal/include/rte_thread.h: #ifdef RTE_EXEC_ENV_WINDOWS DWORD rte_os_thread_id(const rte_thread_t thread_id); // With { return (DWORD)thread_id.opaque_id; } in a C file. #else pthread_t rte_os_thread_id(const rte_thread_t thread_id); // With { return (pthread_t)thread_id.opaque_id; } in a C file. #endif If we do this, we should consider that the current implementation of threading in DPDK should still work, even though its threads might also be managed by other libraries. I.e. DPDK should not cache information about e.g. CPU sets of threads, because their CPU sets may be modified by non-DPDK functions, making the cached information invalid. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-02 10:36 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-05-01 15:06 rte_control event API? Stephen Hemminger 2025-05-02 8:56 ` Morten Brørup 2025-05-02 9:08 ` Bruce Richardson 2025-05-02 10:36 ` Morten Brørup
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).