From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9F753A04E6; Wed, 9 Dec 2020 17:09:09 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 808AAC8F8; Wed, 9 Dec 2020 17:09:08 +0100 (CET) Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by dpdk.org (Postfix) with ESMTP id E1417BE79 for ; Wed, 9 Dec 2020 17:09:06 +0100 (CET) Received: by mail-pj1-f46.google.com with SMTP id v1so1121070pjr.2 for ; Wed, 09 Dec 2020 08:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=24OqYMrR6MPKRMkU3e5/Wd0QukhSo5pnmCjEA8sfxGU=; b=pJxlpfKlk2aYDwxnidsAeGnwJFbRyDonOFhgXEeA6pG/EY5mm4KVc7dq2c3Pto0Fjt 9SJjDnZq8tgZJST+W1nxD1Ce5B7l9bcKatcuFqBpNnqkdtK75rUpCGDB0UISXML9znxl +cW8jtH19RP8uWBv+LELI/X3jZHePCz2OUdR4FY9bop7nFi4RxQ1zGHfrDYgagk07FVj aZSiPB3wTnuInRd7kIVpfSiZcHtOWV5Af/3zLYyjU2spSMXJ/pgTVgSxFNI56j/ocrBU bjAkQWjp0z7MQfTKxeD/uWRIp5pXfa0yzmgoAn1Ta69yqNQYqIMFjr7ybZOPJYfpuXc6 NFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=24OqYMrR6MPKRMkU3e5/Wd0QukhSo5pnmCjEA8sfxGU=; b=RGf0Bb0itPjosRS94S7y0dT9on3Rxbfn7pGF3ZLf5vgLWmdsJu4MhYMv3CznI40Rkp StFa+vLe0HnvGL33qhkiPhSV3rwN8Aw9PHEBkpP17ei6gK31lQd73au2gJiQRmPQIxSV b8p6x2HeM6qcm6yLcShG9+TMa1D+UoCuIOCbUWvvn1eOepyhY8aV8siKkhatOIf+mbhM F0DMYDv/G5JskeAj7Nor5k2idVFv2MIX6t/W4/kY/njD66S0O8Hr0KOhbt42pjE/bwVG le7By0Pj6GnfHBxe5XDzSQ2ypc9hVn/R3vun6gF5jn00TeXcUE0wW+ncQ1BkztObV9Gs mdPw== X-Gm-Message-State: AOAM532Nsv/uGB/9bUx2RGQC7RS9jdzhbtbFLUdJM9WNET+cNt4hfHYA nIAbianL274bQngYSKvrjgE/EA== X-Google-Smtp-Source: ABdhPJwP4NHglM3voFQ6ULwop2gkDZza2p7+I82aN08Ue8akg2ilqmydHmEj1zhlzK1GzjDZc70VBA== X-Received: by 2002:a17:90a:4e47:: with SMTP id t7mr2907564pjl.13.1607530145169; Wed, 09 Dec 2020 08:09:05 -0800 (PST) Received: from hermes.local (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id c3sm2820305pfn.67.2020.12.09.08.09.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Dec 2020 08:09:04 -0800 (PST) Date: Wed, 9 Dec 2020 08:08:58 -0800 From: Stephen Hemminger To: Tal Shnaiderman Cc: Dmitry Kozlyuk , "Dmitry Malloy (MESHCHANINOV)" , Narcisa Ana Maria Vasile , Eilon Greenstein , Omar Cardona , Rani Sharoni , Odi Assli , Harini Ramakrishnan , NBU-Contact-Thomas Monjalon , "dev@dpdk.org" Message-ID: <20201209080858.168e4c52@hermes.local> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] Windows DPDK real-time priority threads causing thread starvation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, 9 Dec 2020 14:15:30 +0000 Tal Shnaiderman wrote: > Hi, >=20 > During our verification tests on Windows DPDK we've noticed that DPDK pol= ling threads, which run in REALTIME_PRIORITY_CLASS are causing starvation t= o other threads from the OS which need to change affinity and run in lower = priority. >=20 > While running an application for a while we see the OS thread waits for 2= :30 minutes and raises a bugcheck, see below example of such flow: >=20 > 1) DPDK thread running on core-0 in real-time high priority(24) polling m= ode. > 2) The thread is blocking the system function=C2=A0NtSetSystemInformation= (ExpUpdateTimerConfiguration) in another thread from=20 > =C2=A0 =C2=A0switching to core-0 via=C2=A0KeSetSystemGroupAffinityThread = since the calling thread is priority 15.=C2=A0 > 3) NtSetSystemInformation exclusively=C2=A0acquired system-wide lock (Exp= TimeRefreshLock) hence=20 > =C2=A0 =C2=A0 it blocks other threads (e.g. calling=C2=A0NtQuerySystemInf= ormation). >=20 > We've seen this behavior only while running on Windows 2019 VMs, maybe on= native machines OS scheduling of such flow is done differently?=20 >=20 > Below is usage explanation from the documentation of SetPriorityClass [1]: >=20 > - REALTIME_PRIORITY_CLASS > Process that has the highest possible priority. The threads of the proces= s preempt the threads of all other processes, including operating system pr= ocesses performing important tasks. For example, a real-time process that e= xecutes for more than a very brief interval can cause disk caches not to fl= ush or cause the mouse to be unresponsive.=20 >=20 > So I assume using this kind of thread for a long period as we do can caus= e unstable behavior. >=20 > How do you think we can resolve this? Are there such cases in Linux? >=20 > [1] - https://docs.microsoft.com/en-us/windows/win32/api/processthreadsap= i/nf-processthreadsapi-setpriorityclass >=20 > Thanks, >=20 > Tal. This is not unique to Windows, Linux has same thing when using SCHED_FIFO. Setting REALTIME is not a magic "go fast" flag it tells scheduler to "run t= his thread at higher priority than kernel". Setting real time is not compatible with = applications doing 100% polling.=20 If you have to use REALTIME then application must change to doing sleep/wak= eup type architecture, not pure polling. Typical DPDK style application is incompatible with SCHED_FIFO/SCHED_RR.