From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-out1.informatik.tu-muenchen.de (mail-out1.informatik.tu-muenchen.de [131.159.0.8]) by dpdk.org (Postfix) with ESMTP id B51AC20F for ; Sun, 26 Apr 2015 20:50:04 +0200 (CEST) Received: from charizard-wifi.fritz.box (p4FFB9E15.dip0.t-ipconnect.de [79.251.158.21]) by mail.net.in.tum.de (Postfix) with ESMTPSA id 80099193A00C for ; Sun, 26 Apr 2015 20:50:02 +0200 (CEST) From: Paul Emmerich Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <6DC6DE50-F94F-419C-98DF-3AD8DCD4F69D@net.in.tum.de> Date: Sun, 26 Apr 2015 20:50:01 +0200 To: dev@dpdk.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) X-Mailer: Apple Mail (2.2098) Subject: [dpdk-dev] Performance regression in DPDK 1.8/2.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Apr 2015 18:50:05 -0000 Hi, I'm working on a DPDK-based packet generator [1] and I recently tried to upgrade from DPDK 1.7.1 to 2.0.0. However, I noticed that DPDK 1.7.1 is about 25% faster than 2.0.0 for my use case. So I ran some basic performance tests on the l2fwd example with DPDK 1.7.1, 1.8.0 and 2.0.0. I used an Intel Xeon E5-2620 v3 CPU clocked down to 1.2 GHz in order to ensure that the CPU and not the network bandwidth is the bottleneck. I configured l2fwd to forward between two interfaces of an X540 NIC using only a single CPU core (-q2) and measured the following throughput under full bidirectional load: Version TP [Mpps] Cycles/Pkt 1.7.1 18.84 84.925690021 1.8.0 16.78 95.351609058 2.0.0 16.40 97.56097561 DPDK 1.7.1 is about 15% faster in this scenario. The obvious suspect is the new mbuf structure introduced in DPDK 1.8, so I profiled L1 cache misses: Version L1 miss ratio 1.7.1 6.5% 1.8.0 13.8% 2.0.0 13.4% FWIW the performance results with my packet generator on the same 1.2 GHz CPU core are: Version TP [Mpps] L1 cache miss ratio 1.7 11.77 4.3% 2.0 9.5 8.4% The discussion about the original patch [2] which introduced the new mbuf structure addresses this potential performance degradation and mentions that it is somehow mitigated. It even claims a 20% *increase* in performance in a specific scenario. However, that doesn't seem to be the case for both l2fwd and my packet generator. Any ideas how to fix this? A 25% loss in throughput prevents me from upgrading to DPDK 2.0.0. I need the new lcore features and the 40 GBit driver updates, so I can't stay on 1.7.1 forever. Paul [1] https://github.com/emmericp/MoonGen [2] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/5155