On Mon, 8 Sep 2025 13:04:38 +0200
Serhii Iliushyk <sil-plv@napatech.com> wrote:

> This modification provides better resource (CPU) management for NTNIC PMD.
>
> The following threads are migrated:
>         * FLM update thread
>         * Statistic thread
>         * Port event thread
>         * Adapter monitoring thread
> Additionally, a warning is added to inform users about the importance of
> dedicating lcores to the DPDK service framework when using the NTNIC PMD.
> The code is also cleaned up to use pthreads and rte_thread APIs.
>
> After this patch series, an each application using NTNIC PMD should
> dedicate at least five(5) cores for DPDK service framework to ensure
> proper operation of the NTNIC PMD.

I was concerned with excessive control thread usage before, and this
seems to be worse not better.

There are conflicting use cases here:
1. The original DPDK goal was to make effective use of multiple cores
     with no locking. Intel customers often had idle lcore's and some CPU's
     had lots of inactive lcores that could be used to get more work done.
     Dedicating some to service tasks etc was a natural outcome.

2. DPDK applications (OVS, Grout, VPP) usually want to know about lcores
     at least in the documentation and examples. They don't cover the case
     of service lcores.

3. Dedicated low core count smart NIC's using DPDK. In this case it
     makes sense to be frugal with lcores since the point of the smart NIC
     is to be able to run other control services. For example, the MS
     NIC had hard limit on the DPDK part (via cgroups) of only 4 + main
     lcores.

Granted NTNIC is likely only being used for a specific application on
a specific set of hardware.

The ideal would be to have better control event management in EAL.
Something like "libevent" style API. This would reduce control core
needs, and avoid any potential resource conflict overlap between control
threads.

Thanks for the detailed feedback.

The migration of the rte_service actually improves the performance of the ntnic. That is an obvious result, since we dedicate the entire core to a single service when the default (1-to-1) mapping is used.
A similar performance and stability can be achieved by using threads and mutexes (The rte_spinlocks introduce instability when used with threads)

The ntnic service API provided with this patch series enables users to map NTNIC services to lcores as necessary.

All NTNIC services can be mapped to separate lcores or to a single lcore; however, this approach has a significant impact on performance.

The mapping of all services to a single lcore is similar to a "libevent" style API. However, there are no events; instead, there are continuous calling services in a predefined order.

The one core cannot process all the services due to a negative performance impact. Following this approach, we have to spawn a thread (or start the service by looking for a free lcore) every time an event occurs. If you have any other vision about it, please share your opinion.

In the scope of complex DPDK-based applications (OVS, Grout, VPP), we still have options to use (1-to-1) mapping by passing raw EAL options -(s SERVICE COREMASK or -S SERVICE CORELIST).

Thanks,
Serhii