From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id 1290B4CC0 for ; Thu, 31 May 2018 14:52:37 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8EE194023B91; Thu, 31 May 2018 12:52:36 +0000 (UTC) Received: from [10.36.112.43] (ovpn-112-43.ams2.redhat.com [10.36.112.43]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 81EED2026DEF; Thu, 31 May 2018 12:52:35 +0000 (UTC) To: "Wang, Zhihong" , "dev@dpdk.org" , "Bie, Tiwei" References: <20180529094514.23835-1-maxime.coquelin@redhat.com> <8F6C2BD409508844A0EFC19955BE09415151C529@SHSMSX103.ccr.corp.intel.com> From: Maxime Coquelin Message-ID: Date: Thu, 31 May 2018 14:52:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <8F6C2BD409508844A0EFC19955BE09415151C529@SHSMSX103.ccr.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 31 May 2018 12:52:36 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 31 May 2018 12:52:36 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: Re: [dpdk-dev] [PATCH v2 0/2] Vhost: unitfy receive paths X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 May 2018 12:52:37 -0000 On 05/31/2018 11:55 AM, Wang, Zhihong wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Tuesday, May 29, 2018 5:45 PM >> To: dev@dpdk.org; Bie, Tiwei ; Wang, Zhihong >> >> Cc: Maxime Coquelin >> Subject: [PATCH v2 0/2] Vhost: unitfy receive paths >> >> Hi, >> >> This second version fixes the feature bit check in >> rxvq_is_mergeable(), and remove "mergeable" from rx funcs >> names. No difference is seen in the benchmarks >> >> This series is preliminary work to ease the integration of >> packed ring layout support. But even without packed ring >> layout, the result is positive. >> >> First patch unify both paths, and second one is a small >> optimization to avoid copying batch_copy_nb_elems VQ field >> to/from the stack. >> >> With the series applied, I get modest performance gain for >> both mergeable and non-mergeable casesi (, and the gain of >> about 300 LoC is non negligible maintenance-wise. >> >> Rx-mrg=off benchmarks: >> >> +------------+-------+-------------+-------------+----------+ >> | Run | PVP | Guest->Host | Host->Guest | Loopback | >> +------------+-------+-------------+-------------+----------+ >> | v18.05-rc5 | 14.47 | 16.64 | 17.57 | 13.15 | >> | + series | 14.87 | 16.86 | 17.70 | 13.30 | >> +------------+-------+-------------+-------------+----------+ >> >> Rx-mrg=on benchmarks: >> >> +------------+------+-------------+-------------+----------+ >> | Run | PVP | Guest->Host | Host->Guest | Loopback | >> +------------+------+-------------+-------------+----------+ >> | v18.05-rc5 | 9.38 | 13.78 | 16.70 | 12.79 | >> | + series | 9.38 | 13.80 | 17.49 | 13.36 | >> +------------+------+-------------+-------------+----------+ >> >> Note: Even without my series, the guest->host benchmark with >> mergeable buffers enabled looks suspicious as it should in >> theory be alsmost identical as when Rx mergeable buffers are >> disabled. To be investigated... >> >> Maxime Coquelin (2): >> vhost: unify Rx mergeable and non-mergeable paths >> vhost: improve batched copies performance >> >> lib/librte_vhost/virtio_net.c | 376 +++++------------------------------------- >> 1 file changed, 37 insertions(+), 339 deletions(-) >> > > Acked-by: Zhihong Wang > > Thanks Maxime! This is really great to see. ;) We probably need the > same improvement for Virtio-pmd. Yes, probably. I'll have a look at it, or if you have time to look at it, won't blame you! :) > One comment on Virtio/Vhost performance analysis: No matter what type > of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to > be clear on who we're testing, and give the other part excessive CPU > resources, otherwise we'll be testing whoever the slowest. > > Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio > threads on N cores, and the corresponding N Vhost threads on a single > core, to do performance comparison. Do you think this makes sense? That's a valid point. I'll try this to get the bottleneck. I'm in the process of setting up an automated test bench, it will help running more and more test cases. > For Guest -> Host, in my test I see Rx-mrg=on has negative impact on > Virtio side, probably because Virtio touches something that's not > touched when Rx-mrg=off. I get it now. When mrg=off, we use simple_tx version whereas we use the full one when mrg is off: static int virtio_dev_configure(struct rte_eth_dev *dev) { ... hw->use_simple_rx = 1; hw->use_simple_tx = 1; #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { hw->use_simple_rx = 0; hw->use_simple_tx = 0; } #endif if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { hw->use_simple_rx = 0; hw->use_simple_tx = 0; } if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM)) hw->use_simple_rx = 0; return 0; } I see two problems here: 1. There should be no reasons not to use simple_tx if mrg is on. 2. We should add test on whether rx and tx offloads have been negotiated to not use simple versions if it has been. Do you agree with that proposed changes? I'll post a RFC for this. Thanks, Maxime > Thanks > -Zhihong >