From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4B3D04320C; Thu, 26 Oct 2023 18:32:18 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 06E9940EE3; Thu, 26 Oct 2023 18:32:18 +0200 (CEST) Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by mails.dpdk.org (Postfix) with ESMTP id 1B33640A80 for ; Thu, 26 Oct 2023 18:32:17 +0200 (CEST) Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1c9fa869a63so8794975ad.0 for ; Thu, 26 Oct 2023 09:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1698337936; x=1698942736; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=5eZJkCQJqp0GKUjKBm3lynkfqQPFmZg4m7c5p++k7cM=; b=tgtQoch7mzDjW5ymPdIS5rOQhu8vYpaWozoz8MPXLG405DRimy4g3AM7ILo2sH+onW vmIS7gNE1A+fwqtsbz3EuBhdRB0/CBBzKFRqhWpfwRS/fMTImQ7riiCU+ZZxn6Z6IqYj vTPqOjVX9A/vYxxtlCOyzC9mOq3jq/Msb7sicBUtqAPylSdSwPbWwY+GcTdymifUqX3o CyPW4PfwB2f3RGcgnHGQsRXpRzsQHXpikWqFWwjpUt7NVog0S2hs6QgdF749SvWbSQfK 4f8LzavE6BPwhhJZz5oKQA+5i8Jkgcd1BOSLB+N74hUfdDNbkeILW0BlpI8MyU9HoR6a h/KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698337936; x=1698942736; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5eZJkCQJqp0GKUjKBm3lynkfqQPFmZg4m7c5p++k7cM=; b=jZVCcGMSK2ZXV6gg+N49p4OixkNZ6llA7bKQmDOQuFgTtKu864SqNMja7MNTTEbNS9 Gf7fCkbiHGTl5rLecRXEZEb3v8hoctztxZfa5IMXlACemhC9hbLd0Ap6HI1thOHWKxve FCxxGjbHB51VNHyHmfA7Y3MaQUyy3NLGsJnq7X3OgVRShhVs3sXmgk/EBpMUwhxl5ila dvWsQ2GrmsnTdC33vog8hq3Vk9Q59Z33T/ZrMAAJdqgQLSGJl/iO1hG8z1KQzGP+z3ki fwglPsoKA74oj+YNGT0l6ZbcvQIPHmSgWGmfD01esPeRJTPrLlF9ktdTiPFtceAnyffD +BxA== X-Gm-Message-State: AOJu0Yzs9tK/69uWJr7VrsMdpbcJlUkBlB0XAZTfDE9kfZcUXLHdRjJm vyo4IE8nkqfNuWhrQkgDdZW0ew== X-Google-Smtp-Source: AGHT+IF4XDeUn6hGQqH98HfOUWDcuDZgGdrdM7F5h9J4j4cFG+Xh5BNcNzLSTcBljeoka+Y2L/onDg== X-Received: by 2002:a17:903:18f:b0:1c9:b187:4d84 with SMTP id z15-20020a170903018f00b001c9b1874d84mr58638plg.14.1698337936118; Thu, 26 Oct 2023 09:32:16 -0700 (PDT) Received: from hermes.local (204-195-126-68.wavecable.com. [204.195.126.68]) by smtp.gmail.com with ESMTPSA id iw7-20020a170903044700b001c736b0037fsm11275840plb.231.2023.10.26.09.32.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 09:32:15 -0700 (PDT) Date: Thu, 26 Oct 2023 09:32:13 -0700 From: Stephen Hemminger To: Morten =?UTF-8?B?QnLDuHJ1cA==?= Cc: "Thomas Monjalon" , , "David Marchand" , , "Anatoly Burakov" , "Dmitry Kozlyuk" , "Narcisa Ana Maria Vasile" , "Dmitry Malloy" , "Pallavi Kadam" , "Tyler Retzlaff" , "Andrew Rybchenko" , "Konstantin Ananyev" Subject: Re: [PATCH v2] eal/unix: allow creating thread with real-time priority Message-ID: <20231026093213.186b12d4@hermes.local> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk> References: <20231024125416.798897-1-thomas@monjalon.net> <20231025151352.995318-1-thomas@monjalon.net> <20231025083700.4e3e274c@hermes.local> <23265462.6Emhk5qWAg@thomas> <98CBD80474FA8B44BF855DF32C47DC35E9EF8B@smartserver.smartshare.dk> <20231025143318.3be26bb3@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9EF91@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, 26 Oct 2023 09:33:42 +0200 Morten Br=C3=B8rup wrote: > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Wednesday, 25 October 2023 23.33 > >=20 > > On Wed, 25 Oct 2023 19:54:06 +0200 > > Morten Br=C3=B8rup wrote: > > =20 > > > I agree with Thomas on this. > > > > > > If you want the log message, please degrade it to INFO or DEBUG level= . It is =20 > > only relevant when chasing problems, not for normal production - and th= us > > NOTICE is too high. > >=20 > > I don't want the message to be hidden. > > If we get any bug reports want to be able to say "read the log, don't do > > that". =20 >=20 > Since Stephen is arguing so strongly for it, I have changed my mind, and = now support Stephen's suggestion. >=20 > It's a tradeoff: Noise for carefully designed systems, vs. important bug = hunting information for systems under development (or casually developed sy= stems). > As Stephen points out, it is a good starting point to check for bug repor= ts possibly related to this. And, I suppose the experienced users who reall= y understands it will not be seriously confused by such a NOTICE message in= the log. >=20 > > =20 > > > Someone might build a kernel with options to keep non-dataplane threa= ds off =20 > > some dedicated CPU cores, so they can be used for guaranteed low-latency > > dataplane threads. We do. We don't use real-time priority, though. > >=20 > > This is really, hard to do. =20 >=20 > As my kids would say: This is really, really, really, really, really hard= to do! >=20 > We have not been able to find an authoritative source of documentation de= scribing how to do it. :-( >=20 > And our experiment shows that we didn't 100 % succeed doing it. But we go= t close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ G= Hz CPU corresponds to max 3 microseconds of added worst-case latency. >=20 > It would be great for latency-sensitive applications if the DPDK document= ation went more into detail on this topic. However, if the DPDK runs on top= of a Linux distro, it essentially depends on the distro, and should be doc= umented there. And if running on top of a custom built Linux Kernel, it ess= entially depends on the kernel, and should be documented there. In other wo= rds: Such information should be contributed there, and not in the DPDK docu= mentation. ;-) >=20 > > Isolated CPU's are not isolated from interrupts > > and other sources which end up scheduling work as kernel threads. Plus = there > > is the behavior where kernel decides to turn a soft irq into a kernel t= hread, > > then starve itself. =20 >=20 > We have configured the kernel to put all of this on CPU 0. (Details furth= er below.) >=20 > > Under starvation, disk corruption is likely if interrupts never get > > processed :-( > > =20 > > > For reference, we did some experiments (using this custom built kerne= l) with =20 > > a dedicated thread doing nothing but a loop calling rte_rdtsc_precise()= and > > registering the delta. Although the overwhelming majority is ca. CPU 80 > > cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of > > magnitude: ca. 45 of these big outliers per minute.) Apparently some ke= rnel > > threads steal some cycles from this thread, regardless of our customiza= tions. > > We haven't bothered analyzing and optimizing it further. > >=20 > > Was this on isolated CPU? =20 >=20 > Yes. We isolate all CPUs but CPU 0. >=20 > > Did you check that that CPU was excluded from the smp_affinty mask on a= ll > > devices? =20 >=20 > Not sure how to do that? >=20 > NB: We are currently only using single-socket hardware - this makes some = things easier. Perhaps this is one of those things? >=20 > > Did you enable the kernel feature to avoid clock ticks if CPU is dedica= ted? =20 >=20 > Yes: > # Timers subsystem > CONFIG_TICK_ONESHOT=3Dy > CONFIG_NO_HZ_COMMON=3Dy > CONFIG_NO_HZ_FULL=3Dy > CONFIG_NO_HZ_FULL_ALL=3Dy >=20 > CONFIG_CMDLINE=3D"isolcpus=3D1-32 irqaffinity=3D0 rcu_nocb_poll" >=20 > > Same thing for RCU, need to adjust parameters? =20 >=20 > Yes: > # RCU Subsystem > CONFIG_TREE_RCU=3Dy > CONFIG_SRCU=3Dy > CONFIG_RCU_STALL_COMMON=3Dy > CONFIG_CONTEXT_TRACKING=3Dy > CONFIG_RCU_NOCB_CPU=3Dy > CONFIG_RCU_NOCB_CPU_ALL=3Dy >=20 > >=20 > > Also, on many systems there can be SMI BIOS hidden execution that will = cause > > big outliers. =20 >=20 > Yes, this is a big surprise to many people, when it happens. Our hardware= doesn't suffer from that. >=20 > >=20 > > Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in l= ots of > > places. =20 >=20 > Yes, this is very important! We treat CPU 0 as if any random process or i= nterrupt handler can take it away at any time. >=20 > > =20 > > > I think our experiment supports the need to allow kernel threads to r= un, =20 > > e.g. by calling sleep() or similar, when an EAL thread has real-time pr= iority. =20 >=20 One benefit of doing real-time thread is that kernel will be more precise in any calls to sleep. If you do small sleep in normal thread, the kernel will= round up the timer to try and avoid reprogramming timer chip and to save power (l= ess wakeups from idle). With RT thread it will do "you wanted 21us, ok for you will do 21us" The project that was originally Vyatta, has a script that tries to isolate = interrupts etc. I started it but they have worked on it since then. https://github.com/danos/vyatta-cpu-shield It adjust kernel workers, softirq, cgroups etc