From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 6F54CA0096 for ; Wed, 5 Jun 2019 14:52:45 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8CD701BB06; Wed, 5 Jun 2019 14:52:44 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id BB34E1B9FC for ; Wed, 5 Jun 2019 14:52:42 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jun 2019 05:52:41 -0700 X-ExtLoop1: 1 Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.51]) by fmsmga005.fm.intel.com with SMTP; 05 Jun 2019 05:52:38 -0700 Received: by (sSMTP sendmail emulation); Wed, 05 Jun 2019 13:52:38 +0100 Date: Wed, 5 Jun 2019 13:52:37 +0100 From: Bruce Richardson To: Maxime Coquelin Cc: dev@dpdk.org, tiwei.bie@intel.com, david.marchand@redhat.com, jfreimann@redhat.com, zhihong.wang@intel.com, konstantin.ananyev@intel.com, mattias.ronnblom@ericsson.com Message-ID: <20190605125237.GE1550@bricha3-MOBL.ger.corp.intel.com> References: <20190529130420.6428-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: I-cache pressure optimizations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 05, 2019 at 02:32:27PM +0200, Maxime Coquelin wrote: > > > On 5/29/19 3:04 PM, Maxime Coquelin wrote: > > Some OVS-DPDK PVP benchmarks show a performance drop > > when switching from DPDK v17.11 to v18.11. > > > > With the addition of packed ring layout support, > > rte_vhost_enqueue_burst and rte_vhost_dequeue_burst > > became very large, and only a part of the instructions > > are executed (either packed or split ring used). > > > > This series aims at improving the I-cache pressure, > > first by un-inlining split and packed rings, but > > also by moving parts considered as cold in dedicated > > functions (dirty page logging, fragmented descriptors > > buffer management added for CVE-2018-1059). > > > > With the series applied, size of the enqueue and > > dequeue split paths is reduced significantly: > > > > +---------+--------------------+---------------------+ > > | Version | Enqueue split path | Dequeue split path | > > +---------+--------------------+---------------------+ > > | v19.05 | 16461B | 25521B | > > | +series | 7286B | 11285B | > > +---------+--------------------+---------------------+ > > > > Using perf tool to monitor iTLB-load-misses event > > while doing PVP benchmark with testpmd as vswitch, > > we can see the number of iTLB misses being reduced: > > > > - v19.05: > > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > > > 2,438 iTLB-load-miss ( +- 13.43% ) > > > > 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) > > > > - +series: > > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > > > 55 iTLB-load-miss ( +- 10.08% ) > > > > 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) > > > > The series also force the inlining of some rte_memcpy > > helpers, as by adding packed ring support, some of them > > were not more inlined but embedded as functions in > > the virtio_net object file, which was not expected. > > > > Finally, the series simplifies the descriptors buffers > > prefetching, by doing it in the recently introduced > > descriptor buffer mapping function. > > > > v3: > > === > > - Prefix alloc_copy_ind_table with vhost_ (Mattias) > > - Remove double new line (Tiwei) > > - Fix grammar error in patch 3's commit message (Jens) > > - Force noinline for hear copy functions (Mattias) > > - Fix dst assignement in copy_hdr_from_desc (Tiwei) > > > > v2: > > === > > - Fix checkpatch issue > > - Reset author for patch 5 (David) > > - Force non-inlining in patch 2 (David) > > - Fix typo in path 3 commit message (David) > > > > Maxime Coquelin (5): > > vhost: un-inline dirty pages logging functions > > vhost: do not inline packed and split functions > > vhost: do not inline unlikely fragmented buffers code > > vhost: simplify descriptor's buffer prefetching > > eal/x86: force inlining of all memcpy and mov helpers > > > > .../common/include/arch/x86/rte_memcpy.h | 18 +- > > lib/librte_vhost/vdpa.c | 2 +- > > lib/librte_vhost/vhost.c | 164 +++++++++++++++++ > > lib/librte_vhost/vhost.h | 165 ++---------------- > > lib/librte_vhost/virtio_net.c | 140 +++++++-------- > > 5 files changed, 251 insertions(+), 238 deletions(-) > > > > > Applied patches 1 to 4 to dpdk-next-virtio/master. > > Bruce, I'm assigning patch 5 to you in Patchwork, as this is not > vhost/virtio specific. > Patch looks ok to me, but I'm not the one to apply it. /Bruce