From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 890A3A00C4 for ; Thu, 6 Oct 2022 13:07:43 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7C61742C22; Thu, 6 Oct 2022 13:07:43 +0200 (CEST) Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by mails.dpdk.org (Postfix) with ESMTP id D820B42C09 for ; Thu, 6 Oct 2022 13:07:40 +0200 (CEST) Received: by mail-pj1-f51.google.com with SMTP id lx7so1520422pjb.0 for ; Thu, 06 Oct 2022 04:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=E29tMLJm4IyCwZ3Cr0hUiljpJ8oUA0HjbAIuDko4vgg=; b=TO/8loF6BOCzszH+/4ZcldUOPy77hcQQfMzgnqFXxrafkgOWmlUJUGyIHShGm9W2ol /txzJ3dpm9bApJ6RMNnwkcKl2RhypaPrsFnv/QYGWOWk4VuYL/Nww5mEXBJIVXTY8FMB D4RRlumO/bjCBcdPUFhwH6/jjB/5nvHPtfrUk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=E29tMLJm4IyCwZ3Cr0hUiljpJ8oUA0HjbAIuDko4vgg=; b=MBk9qDKFouGFeVYWVhyrGULxfN4ZRZm8xh7ta3MsiXzyBEBPZK4kwszEyFNCyH504H 6BAI9+H8v0a6ZVdW6P5mVkpgbzvj+usSHbETZVfdgAxUikDJR0Cqshr+y2daBujgGnjV +Z22ePpdERfXvKLzTLYwx7mhkGBTKmCYLPb//4nR/cA7akb8SRu6b0Soz1RpDrd95XcC TRYLPNFnbhkCNaPFIgM/IuOAnMGvDG2sqYjUV5zkK5X6PvCL7XqlHOBipNxm5rxNrEtw 00+kdyvAQ31teGDTofMZPjfw8EKJDh2q/s/kksV+b1+faxV3+XyX7FcSrl7QhBiuWMjO fB1w== X-Gm-Message-State: ACrzQf0Gih1CPSXACWrGjBjaTrmmy7YqaikeFZ0joTUd2UyTaQq0f/cP S5coDY3Q1roxcWGw1e+6vcccpVoO2gs2e0IPoniBnw== X-Google-Smtp-Source: AMsMyM5v0TEaF0l6ePQE1Xyuy2s9AHN2FPQXHugOFmohSqZ2kKczY9rD6F6efFV/hT1GAGpHtqExqYt2o7dGkSnCS2w= X-Received: by 2002:a17:902:c951:b0:176:d421:7502 with SMTP id i17-20020a170902c95100b00176d4217502mr4080878pla.72.1665054459946; Thu, 06 Oct 2022 04:07:39 -0700 (PDT) MIME-Version: 1.0 References: <739ee0ca-ccbe-5918-c2af-18e77327a898@ericsson.com> <3000673.mvXUDI8C0e@thomas> <98CBD80474FA8B44BF855DF32C47DC35D87399@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D8739B@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D8739D@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8739D@smartserver.smartshare.dk> From: Lincoln Lavoie Date: Thu, 6 Oct 2022 07:07:27 -0400 Message-ID: Subject: Re: [dpdklab] RE: rte_service unit test failing randomly To: =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Thomas Monjalon , Van Haaren Harry , David Marchand , dpdklab , ci@dpdk.org, Honnappa Nagarahalli , Aaron Conole , dev Content-Type: multipart/alternative; boundary="0000000000009e015005ea5bb3ef" X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org --0000000000009e015005ea5bb3ef Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 6, 2022 at 5:49 AM Morten Br=C3=B8rup wrote: > > From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com] > > Sent: Thursday, 6 October 2022 10.59 > > > > On 2022-10-06 10:18, Morten Br=C3=B8rup wrote: > > >> From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com] > > >> Sent: Thursday, 6 October 2022 09.51 > > >> > > >> On 2022-10-06 08:53, Morten Br=C3=B8rup wrote: > > > > > > [...] > > > > > >>> I have been wondering how accurate the tests really are. Where can > > I > > >> see what is being done to ensure that the EAL worker threads are > > fully > > >> isolated, and never interrupted by the O/S scheduler or similar? > > >>> > > >> > > >> There are kernel-level counters for how many times a thread have > > been > > >> involuntarily interrupted, > > > > > > Thanks, Mattias. I will look into that. > > > > > > Old kernels (2.4 and 2.6) ascribed the time spent in interrupt > > handlers to the CPU usage of the running process, instead of counting > > the time spent in interrupt handlers separately. Does anyone know it > > this has been fixed? > > > > > > > If you mean top half interrupt handler, my guess would be it does not > > matter, except in some strange corner cases or faulty hardware. An ISR > > should have very short run time, and not being run *that* often (after > > NAPI). With isolated cores, it should be even less of a problem, but > > then you may not have that. > > > > Many years ago, we used a NIC that didn't have DMA, and only 4 RX > descriptors, so it had to be serviced in the top half. > > > Bottom halves are not attributed to the process, I believe. > > This is an improvement. > > > (In old > > kernels, the time spent in soft IRQs were not attributed to anything, > > which could create situations where the system was very busy indeed > > [e.g., with network stack bottom halves doing IP forwarding], but > > looking idle in 'top'.) > > We also experienced that. The kernel's scheduling information was > completely useless, so eventually we removed the CPU Utilization > information from our GUI. ;-) > > And IIRC, it wasn't fixed in kernel 2.6. > > > > > >> and also, if I recall correctly, the amount > > >> of wall-time the thread have been runnable, but not running (i.e., > > >> waiting to be scheduled). The latter may require some scheduler > > debug > > >> kernel option being enabled on the kernel build. > > > > > > > > Back to the topic of unit testing, I think we need to consider their purpose and where ew expect them to run. Unit tests are run in automated environments, across multiple CI systems, i.e. UNH-IOL Community Lab, GitHub, etc. Those environments are typically virtualized and I don't think the unit tests should require turning down to the level of CPU clock ticks. Those tests are likely better suited to dedicated performance environments, where the complete host is tightly controlled, for the purpose of repeatable and deterministic results on things like packet throughput, etc. Cheers, Lincoln --=20 *Lincoln Lavoie* Principal Engineer, Broadband Technologies 21 Madbury Rd., Ste. 100, Durham, NH 03824 lylavoie@iol.unh.edu https://www.iol.unh.edu +1-603-674-2755 (m) --0000000000009e015005ea5bb3ef Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Oct 6, 2022 at 5:49 AM Morten Br=C3=B8rup &= lt;mb@smartsharesystems.com= > wrote:
>= From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 6 October 2022 10.59
>
> On 2022-10-06 10:18, Morten Br=C3=B8rup wrote:
> >> From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com= ]
> >> Sent: Thursday, 6 October 2022 09.51
> >>
> >> On 2022-10-06 08:53, Morten Br=C3=B8rup wrote:
> >
> > [...]
> >
> >>> I have been wondering how accurate the tests really are. = Where can
> I
> >> see what is being done to ensure that the EAL worker threads = are
> fully
> >> isolated, and never interrupted by the O/S scheduler or simil= ar?
> >>>
> >>
> >> There are kernel-level counters for how many times a thread h= ave
> been
> >> involuntarily interrupted,
> >
> > Thanks, Mattias. I will look into that.
> >
> > Old kernels (2.4 and 2.6) ascribed the time spent in interrupt > handlers to the CPU usage of the running process, instead of counting<= br> > the time spent in interrupt handlers separately. Does anyone know it > this has been fixed?
> >
>
> If you mean top half interrupt handler, my guess would be it does not<= br> > matter, except in some strange corner cases or faulty hardware. An ISR=
> should have very short run time, and not being run *that* often (after=
> NAPI). With isolated cores, it should be even less of a problem, but > then you may not have that.
>

Many years ago, we used a NIC that didn't have DMA, and only 4 RX descr= iptors, so it had to be serviced in the top half.

> Bottom halves are not attributed to the process, I believe.

This is an improvement.

> (In old
> kernels, the time spent in soft IRQs were not attributed to anything,<= br> > which could create situations where the system was very busy indeed > [e.g., with network stack bottom halves doing IP forwarding], but
> looking idle in 'top'.)

We also experienced that. The kernel's scheduling information was compl= etely useless, so eventually we removed the CPU Utilization information fro= m our GUI. ;-)

And IIRC, it wasn't fixed in kernel 2.6.

>
> >> and also, if I recall correctly, the amount
> >> of wall-time the thread have been runnable, but not running (= i.e.,
> >> waiting to be scheduled). The latter may require some schedul= er
> debug
> >> kernel option being enabled on the kernel build.
> >
> >

B= ack to the topic of unit testing, I think we need to consider their purpose= and where ew expect them to run.=C2=A0 Unit tests are run in automated env= ironments, across multiple CI systems, i.e. UNH-IOL Community Lab, GitHub, = etc.=C2=A0 Those environments are typically virtualized and I don't thi= nk the unit tests should require turning down to the level of CPU clock tic= ks.=C2=A0 Those tests are likely better suited to dedicated performance env= ironments, where the complete host is tightly controlled, for the purpose o= f repeatable and deterministic results on things like packet throughput, et= c.

Cheers,
Lincoln

--
Lincoln Lavoie
Principal Engineer, B= roadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824<= /div>
+1-603-674-2755 (m)
=
--0000000000009e015005ea5bb3ef--