From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail3.advaoptical.com (mail3.advaoptical.com [74.202.24.82]) by dpdk.org (Postfix) with ESMTP id DFBF4AAC7 for ; Wed, 16 May 2018 18:39:22 +0200 (CEST) Received: from ATL-S-EX16A.advaoptical.com ([172.16.5.10]) by atl-vs-fsmail.advaoptical.com (8.16.0.22/8.16.0.22) with ESMTPS id w4GGdL3L007425 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA256 bits=128 verify=FAIL) for ; Wed, 16 May 2018 12:39:21 -0400 Received: from ATL-S-EX16B.advaoptical.com (172.16.5.11) by ATL-S-EX16A.advaoptical.com (172.16.5.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1531.1; Wed, 16 May 2018 12:39:20 -0400 Received: from ATL-S-EX16B.advaoptical.com ([fe80::d04c:21c4:92cf:e1cd]) by ATL-S-EX16B.advaoptical.com ([fe80::d04c:21c4:92cf:e1cd%12]) with mapi id 15.01.1531.001; Wed, 16 May 2018 12:39:21 -0400 From: Tim Shearer To: "users@dpdk.org" Thread-Topic: Older DPDK guest performance with 4.14+ host kernels Thread-Index: AdPtNDt75YoKz8/BQNi/aKj4RK7QZg== Date: Wed, 16 May 2018 16:39:21 +0000 Message-ID: <4ca1e0ccc4574026a73a76b9b15a5b26@advaoptical.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.5.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-16_08:, , signatures=0 Subject: [dpdk-users] Older DPDK guest performance with 4.14+ host kernels X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 May 2018 16:39:23 -0000 All, I have found that running older DPDK applications (prior to approx 16.11) a= s a KVM guest on systems with a 4.14 or later host kernel may result in a s= ignificant performance penalty. For example, the l2fwd application is unabl= e to reliably pass traffic at any rate despite running on a dedicated, pinn= ed VCPU, isolated from Linux tasks with isolcpus on both the guest and host= .=20 Using Perf on the host, it became apparent that KVM was regularly exiting i= nto userspace. These context switches are expensive and resulted in drops: CPU-8929 [005] 5403.733805: kvm_exit: reason IO_= INSTRUCTION rip 0x483215 info 800040 0 CPU-8929 [005] 5403.733806: kvm_pio: pio_write = at 0x80 size 1 count 1 val 0x0=20 CPU-8929 [005] 5403.733808: kvm_userspace_exit: reason KVM= _EXIT_IO (2) The root cause of this to a kernel patch submitted a few months ago, which = disables VMX handling of I/O port 0x80 writes, forcing it instead to be emu= lated by QEMU: https://patchwork.kernel.org/patch/10087713/. So, the DPDK g= uest app is generating a lot of 0x80 writes, and KVM isn't handling them an= ymore. The cause of the 0x80 writes are glibc's outw_p function. This was called e= xtensively by the virtio_pci driver prior to this change: http://www.dpdk.o= rg/ml/archives/dev/2016-February/032782.html I've attempted to push for a kernel fix to the KVM/VMX module to allow port= 80 to be handled in hardware, perhaps as a configurable parameter passed i= n by QEMU. This isn't getting any traction, hence this email - basically if= you're using a 4.14 or later host kernel, you'll need to use relatively mo= dern VNFs (based on DDPK 16.11 or later), or alternatively, revert the kern= el patch linked to above. Be advised that the patch addresses a potential D= oS issue, so this would only be advisable for trusted guests. Feel free to message me if you want further details. Thanks, Tim Shearer