On Thu, Oct 6, 2022 at 5:49 AM Morten Brørup <mb@smartsharesystems.com> wrote:
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 6 October 2022 10.59
>
> On 2022-10-06 10:18, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Thursday, 6 October 2022 09.51
> >>
> >> On 2022-10-06 08:53, Morten Brørup wrote:
> >
> > [...]
> >
> >>> I have been wondering how accurate the tests really are. Where can
> I
> >> see what is being done to ensure that the EAL worker threads are
> fully
> >> isolated, and never interrupted by the O/S scheduler or similar?
> >>>
> >>
> >> There are kernel-level counters for how many times a thread have
> been
> >> involuntarily interrupted,
> >
> > Thanks, Mattias. I will look into that.
> >
> > Old kernels (2.4 and 2.6) ascribed the time spent in interrupt
> handlers to the CPU usage of the running process, instead of counting
> the time spent in interrupt handlers separately. Does anyone know it
> this has been fixed?
> >
>
> If you mean top half interrupt handler, my guess would be it does not
> matter, except in some strange corner cases or faulty hardware. An ISR
> should have very short run time, and not being run *that* often (after
> NAPI). With isolated cores, it should be even less of a problem, but
> then you may not have that.
>

Many years ago, we used a NIC that didn't have DMA, and only 4 RX descriptors, so it had to be serviced in the top half.

> Bottom halves are not attributed to the process, I believe.

This is an improvement.

> (In old
> kernels, the time spent in soft IRQs were not attributed to anything,
> which could create situations where the system was very busy indeed
> [e.g., with network stack bottom halves doing IP forwarding], but
> looking idle in 'top'.)

We also experienced that. The kernel's scheduling information was completely useless, so eventually we removed the CPU Utilization information from our GUI. ;-)

And IIRC, it wasn't fixed in kernel 2.6.

>
> >> and also, if I recall correctly, the amount
> >> of wall-time the thread have been runnable, but not running (i.e.,
> >> waiting to be scheduled). The latter may require some scheduler
> debug
> >> kernel option being enabled on the kernel build.
> >
> >

Back to the topic of unit testing, I think we need to consider their purpose and where ew expect them to run.  Unit tests are run in automated environments, across multiple CI systems, i.e. UNH-IOL Community Lab, GitHub, etc.  Those environments are typically virtualized and I don't think the unit tests should require turning down to the level of CPU clock ticks.  Those tests are likely better suited to dedicated performance environments, where the complete host is tightly controlled, for the purpose of repeatable and deterministic results on things like packet throughput, etc.

Cheers,
Lincoln


--
Lincoln Lavoie
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
+1-603-674-2755 (m)