From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id B107BA00E6 for ; Fri, 17 May 2019 16:42:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 809147CB0; Fri, 17 May 2019 16:42:57 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 8932E6CD8 for ; Fri, 17 May 2019 16:42:55 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DB1DF30C1CC1; Fri, 17 May 2019 14:42:54 +0000 (UTC) Received: from [10.36.112.59] (ovpn-112-59.ams2.redhat.com [10.36.112.59]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1935160852; Fri, 17 May 2019 14:42:46 +0000 (UTC) To: David Marchand Cc: dev , Tiwei Bie , Jens Freimann , Zhihong Wang , Bruce Richardson , "Ananyev, Konstantin" References: <20190517122220.31283-1-maxime.coquelin@redhat.com> From: Maxime Coquelin Message-ID: Date: Fri, 17 May 2019 16:42:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 17 May 2019 14:42:54 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH 0/5] vhost: I-cache pressure optimizations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 5/17/19 3:04 PM, David Marchand wrote: > > > On Fri, May 17, 2019 at 2:23 PM Maxime Coquelin > > wrote: > > Some OVS-DPDK PVP benchmarks show a performance drop > when switching from DPDK v17.11 to v18.11. > > With the addition of packed ring layout support, > rte_vhost_enqueue_burst and rte_vhost_dequeue_burst > became very large, and only a part of the instructions > are executed (either packed or split ring used). > > This series aims at improving the I-cache pressure, > first by un-inlining split and packed rings, but > also by moving parts considered as cold in dedicated > functions (dirty page logging, fragmented descriptors > buffer management added for CVE-2018-1059). > > With the series applied, size of the enqueue and > dequeue split paths is reduced significantly: > > +---------+--------------------+---------------------+ > | Version | Enqueue split path |  Dequeue split path | > +---------+--------------------+---------------------+ > | v19.05  | 16461B             | 25521B              | > | +series | 7286B              | 11285B              | > +---------+--------------------+---------------------+ > > Using perf tool to monitor iTLB-load-misses event > while doing PVP benchmark with testpmd as vswitch, > we can see the number of iTLB misses being reduced: > > - v19.05: > # perf stat --repeat 10  -C 2,3  -e iTLB-load-miss -- sleep 10 > >  Performance counter stats for 'CPU(s) 2,3' (10 runs): > >              2,438      iTLB-load-miss >                   ( +- 13.43% ) > >        10.00058928 +- 0.00000336 seconds time elapsed  ( +-  0.00% ) > > - +series: > # perf stat --repeat 10  -C 2,3  -e iTLB-load-miss -- sleep 10 > >  Performance counter stats for 'CPU(s) 2,3' (10 runs): > >                 55      iTLB-load-miss >                   ( +- 10.08% ) > >        10.00059466 +- 0.00000283 seconds time elapsed  ( +-  0.00% ) > > The series also force the inlining of some rte_memcpy > helpers, as by adding packed ring support, some of them > were not more inlined but embedded as functions in > the virtio_net object file, which was not expected. > > Finally, the series simplifies the descriptors buffers > prefetching, by doing it in the recently introduced > descriptor buffer mapping function. > > Maxime Coquelin (4): >   vhost: un-inline dirty pages logging functions >   vhost: do not inline packed and split functions >   vhost: do not inline unlikely fragmented buffers code >   vhost: simplify descriptor's buffer prefetching > > root (1): >   eal/x86: force inlining of all memcpy and mov helpers > > > root ? "oops" :-) Indeed... Oops! > > > -- > David Marchand