From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A2AB1A00C2; Thu, 6 Oct 2022 13:07:42 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8C94942C09; Thu, 6 Oct 2022 13:07:42 +0200 (CEST) Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by mails.dpdk.org (Postfix) with ESMTP id D5A3942BF0 for ; Thu, 6 Oct 2022 13:07:40 +0200 (CEST) Received: by mail-pj1-f43.google.com with SMTP id pq16so1469326pjb.2 for ; Thu, 06 Oct 2022 04:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=E29tMLJm4IyCwZ3Cr0hUiljpJ8oUA0HjbAIuDko4vgg=; b=TO/8loF6BOCzszH+/4ZcldUOPy77hcQQfMzgnqFXxrafkgOWmlUJUGyIHShGm9W2ol /txzJ3dpm9bApJ6RMNnwkcKl2RhypaPrsFnv/QYGWOWk4VuYL/Nww5mEXBJIVXTY8FMB D4RRlumO/bjCBcdPUFhwH6/jjB/5nvHPtfrUk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=E29tMLJm4IyCwZ3Cr0hUiljpJ8oUA0HjbAIuDko4vgg=; b=ClZm9aZhOWV7hb+w9zAedwVcFcLM5DqDthJ3+yZLJV2V6xnwgOcxRFqzltSoAy7TSD Db9mqNx8qIPFN5MfsNIZ9oFmDhNkb2vSz61RWQ6X9qnBTVtW0RRpxcDkVQgDXPSnQXGE UPEKxkdYSKut6Cue4Jxd23pZdDXYO1vt5omFdDyBf/Hf2LZkVDV2OIbjOAHJL80Qxttf quwI5LWSTRjh3aHRjJlZqTKJLMyy/ntCz3YcsYyNXzRTedDymCNgMLdLg5s0jNRqdnZ+ ZHBInzvsKRih62G20k7lbSlKVxK2GkPk5Ubh6RQhCa8Iji/t8rcDAkEJn34Z96oWPP+0 GMpA== X-Gm-Message-State: ACrzQf0dlLsuTdWxdUOC3byZJwehpWRa3w8UgBV0z5HxFC7LebBNrtdr FJ3RkDSdG9NJMRk01uqdRdrfFv432BjHYfQvpzJ4KA== X-Google-Smtp-Source: AMsMyM5v0TEaF0l6ePQE1Xyuy2s9AHN2FPQXHugOFmohSqZ2kKczY9rD6F6efFV/hT1GAGpHtqExqYt2o7dGkSnCS2w= X-Received: by 2002:a17:902:c951:b0:176:d421:7502 with SMTP id i17-20020a170902c95100b00176d4217502mr4080878pla.72.1665054459946; Thu, 06 Oct 2022 04:07:39 -0700 (PDT) MIME-Version: 1.0 References: <739ee0ca-ccbe-5918-c2af-18e77327a898@ericsson.com> <3000673.mvXUDI8C0e@thomas> <98CBD80474FA8B44BF855DF32C47DC35D87399@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D8739B@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D8739D@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8739D@smartserver.smartshare.dk> From: Lincoln Lavoie Date: Thu, 6 Oct 2022 07:07:27 -0400 Message-ID: Subject: Re: [dpdklab] RE: rte_service unit test failing randomly To: =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Thomas Monjalon , Van Haaren Harry , David Marchand , dpdklab , ci@dpdk.org, Honnappa Nagarahalli , Aaron Conole , dev Content-Type: multipart/alternative; boundary="0000000000009e015005ea5bb3ef" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000009e015005ea5bb3ef Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Oct 6, 2022 at 5:49 AM Morten Br=C3=B8rup wrote: > > From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com] > > Sent: Thursday, 6 October 2022 10.59 > > > > On 2022-10-06 10:18, Morten Br=C3=B8rup wrote: > > >> From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com] > > >> Sent: Thursday, 6 October 2022 09.51 > > >> > > >> On 2022-10-06 08:53, Morten Br=C3=B8rup wrote: > > > > > > [...] > > > > > >>> I have been wondering how accurate the tests really are. Where can > > I > > >> see what is being done to ensure that the EAL worker threads are > > fully > > >> isolated, and never interrupted by the O/S scheduler or similar? > > >>> > > >> > > >> There are kernel-level counters for how many times a thread have > > been > > >> involuntarily interrupted, > > > > > > Thanks, Mattias. I will look into that. > > > > > > Old kernels (2.4 and 2.6) ascribed the time spent in interrupt > > handlers to the CPU usage of the running process, instead of counting > > the time spent in interrupt handlers separately. Does anyone know it > > this has been fixed? > > > > > > > If you mean top half interrupt handler, my guess would be it does not > > matter, except in some strange corner cases or faulty hardware. An ISR > > should have very short run time, and not being run *that* often (after > > NAPI). With isolated cores, it should be even less of a problem, but > > then you may not have that. > > > > Many years ago, we used a NIC that didn't have DMA, and only 4 RX > descriptors, so it had to be serviced in the top half. > > > Bottom halves are not attributed to the process, I believe. > > This is an improvement. > > > (In old > > kernels, the time spent in soft IRQs were not attributed to anything, > > which could create situations where the system was very busy indeed > > [e.g., with network stack bottom halves doing IP forwarding], but > > looking idle in 'top'.) > > We also experienced that. The kernel's scheduling information was > completely useless, so eventually we removed the CPU Utilization > information from our GUI. ;-) > > And IIRC, it wasn't fixed in kernel 2.6. > > > > > >> and also, if I recall correctly, the amount > > >> of wall-time the thread have been runnable, but not running (i.e., > > >> waiting to be scheduled). The latter may require some scheduler > > debug > > >> kernel option being enabled on the kernel build. > > > > > > > > Back to the topic of unit testing, I think we need to consider their purpose and where ew expect them to run. Unit tests are run in automated environments, across multiple CI systems, i.e. UNH-IOL Community Lab, GitHub, etc. Those environments are typically virtualized and I don't think the unit tests should require turning down to the level of CPU clock ticks. Those tests are likely better suited to dedicated performance environments, where the complete host is tightly controlled, for the purpose of repeatable and deterministic results on things like packet throughput, etc. Cheers, Lincoln --=20 *Lincoln Lavoie* Principal Engineer, Broadband Technologies 21 Madbury Rd., Ste. 100, Durham, NH 03824 lylavoie@iol.unh.edu https://www.iol.unh.edu +1-603-674-2755 (m) --0000000000009e015005ea5bb3ef Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Oct 6, 2022 at 5:49 AM Morten Br=C3=B8rup &= lt;mb@smartsharesystems.com= > wrote:
>= From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Thursday, 6 October 2022 10.59
>
> On 2022-10-06 10:18, Morten Br=C3=B8rup wrote:
> >> From: Mattias R=C3=B6nnblom [mailto:mattias.ronnblom@ericsson.com= ]
> >> Sent: Thursday, 6 October 2022 09.51
> >>
> >> On 2022-10-06 08:53, Morten Br=C3=B8rup wrote:
> >
> > [...]
> >
> >>> I have been wondering how accurate the tests really are. = Where can
> I
> >> see what is being done to ensure that the EAL worker threads = are
> fully
> >> isolated, and never interrupted by the O/S scheduler or simil= ar?
> >>>
> >>
> >> There are kernel-level counters for how many times a thread h= ave
> been
> >> involuntarily interrupted,
> >
> > Thanks, Mattias. I will look into that.
> >
> > Old kernels (2.4 and 2.6) ascribed the time spent in interrupt > handlers to the CPU usage of the running process, instead of counting<= br> > the time spent in interrupt handlers separately. Does anyone know it > this has been fixed?
> >
>
> If you mean top half interrupt handler, my guess would be it does not<= br> > matter, except in some strange corner cases or faulty hardware. An ISR=
> should have very short run time, and not being run *that* often (after=
> NAPI). With isolated cores, it should be even less of a problem, but > then you may not have that.
>

Many years ago, we used a NIC that didn't have DMA, and only 4 RX descr= iptors, so it had to be serviced in the top half.

> Bottom halves are not attributed to the process, I believe.

This is an improvement.

> (In old
> kernels, the time spent in soft IRQs were not attributed to anything,<= br> > which could create situations where the system was very busy indeed > [e.g., with network stack bottom halves doing IP forwarding], but
> looking idle in 'top'.)

We also experienced that. The kernel's scheduling information was compl= etely useless, so eventually we removed the CPU Utilization information fro= m our GUI. ;-)

And IIRC, it wasn't fixed in kernel 2.6.

>
> >> and also, if I recall correctly, the amount
> >> of wall-time the thread have been runnable, but not running (= i.e.,
> >> waiting to be scheduled). The latter may require some schedul= er
> debug
> >> kernel option being enabled on the kernel build.
> >
> >

B= ack to the topic of unit testing, I think we need to consider their purpose= and where ew expect them to run.=C2=A0 Unit tests are run in automated env= ironments, across multiple CI systems, i.e. UNH-IOL Community Lab, GitHub, = etc.=C2=A0 Those environments are typically virtualized and I don't thi= nk the unit tests should require turning down to the level of CPU clock tic= ks.=C2=A0 Those tests are likely better suited to dedicated performance env= ironments, where the complete host is tightly controlled, for the purpose o= f repeatable and deterministic results on things like packet throughput, et= c.

Cheers,
Lincoln

--
Lincoln Lavoie
Principal Engineer, B= roadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824<= /div>
+1-603-674-2755 (m)
=
--0000000000009e015005ea5bb3ef--