From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 851742934 for ; Fri, 4 Nov 2016 09:00:01 +0100 (CET) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E1B33155C3; Fri, 4 Nov 2016 08:00:00 +0000 (UTC) Received: from [10.36.4.213] (vpn1-4-213.ams2.redhat.com [10.36.4.213]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA47xvhE015588 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Nov 2016 03:59:58 -0400 To: "Wang, Zhihong" , Yuanhan Liu References: <1474965769-24782-1-git-send-email-maxime.coquelin@redhat.com> <70cc3b89-d680-1519-add3-f38b228e65b5@redhat.com> <20161017132121.GG16751@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7D8BDF@SHSMSX103.ccr.corp.intel.com> <20161027103317.GM16751@yliu-dev.sh.intel.com> <0ba8f8c9-2174-b3c1-4f07-f6911bffa6cd@redhat.com> <20161027104621.GN16751@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7D90C7@SHSMSX103.ccr.corp.intel.com> <88169067-290d-a7bb-ab2c-c9b8ec1b1ded@redhat.com> <8F6C2BD409508844A0EFC19955BE09414E7DA533@SHSMSX103.ccr.corp.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7DC40F@SHSMSX103.ccr.corp.intel.com> <17d285a9-818c-b060-8969-daccb052dc1f@redhat.com> Cc: "stephen@networkplumber.org" , "Pierre Pfister (ppfister)" , "Xie, Huawei" , "dev@dpdk.org" , "vkaplans@redhat.com" , "mst@redhat.com" From: Maxime Coquelin Message-ID: <7e1c8953-db15-f377-cece-85cb7169bb17@redhat.com> Date: Fri, 4 Nov 2016 08:59:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <17d285a9-818c-b060-8969-daccb052dc1f@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 04 Nov 2016 08:00:01 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Nov 2016 08:00:02 -0000 On 11/04/2016 08:57 AM, Maxime Coquelin wrote: > Hi Zhihong, > > On 11/04/2016 08:20 AM, Wang, Zhihong wrote: >> >> >>> -----Original Message----- >>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>> Sent: Thursday, November 3, 2016 4:11 PM >>> To: Wang, Zhihong ; Yuanhan Liu >>> >>> Cc: stephen@networkplumber.org; Pierre Pfister (ppfister) >>> ; Xie, Huawei ; dev@dpdk.org; >>> vkaplans@redhat.com; mst@redhat.com >>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors >>> support >>> to the TX path >>> >>> >>> >>> On 11/02/2016 11:51 AM, Maxime Coquelin wrote: >>>> >>>> >>>> On 10/31/2016 11:01 AM, Wang, Zhihong wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>>>>> Sent: Friday, October 28, 2016 3:42 PM >>>>>> To: Wang, Zhihong ; Yuanhan Liu >>>>>> >>>>>> Cc: stephen@networkplumber.org; Pierre Pfister (ppfister) >>>>>> ; Xie, Huawei ; >>> dev@dpdk.org; >>>>>> vkaplans@redhat.com; mst@redhat.com >>>>>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors >>>>>> support >>>>>> to the TX path >>>>>> >>>>>> >>>>>> >>>>>> On 10/28/2016 02:49 AM, Wang, Zhihong wrote: >>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] >>>>>>>>> Sent: Thursday, October 27, 2016 6:46 PM >>>>>>>>> To: Maxime Coquelin >>>>>>>>> Cc: Wang, Zhihong ; >>>>>>>>> stephen@networkplumber.org; Pierre Pfister (ppfister) >>>>>>>>> ; Xie, Huawei ; >>>>>> dev@dpdk.org; >>>>>>>>> vkaplans@redhat.com; mst@redhat.com >>>>>>>>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors >>>>>> support >>>>>>>>> to the TX path >>>>>>>>> >>>>>>>>> On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/27/2016 12:33 PM, Yuanhan Liu wrote: >>>>>>>>>>>>> On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin >>>>>> wrote: >>>>>>>>>>>>>>> Hi Zhihong, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 10/27/2016 11:00 AM, Wang, Zhihong wrote: >>>>>>>>>>>>>>>>> Hi Maxime, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Seems indirect desc feature is causing serious >>> performance >>>>>>>>>>>>>>>>> degradation on Haswell platform, about 20% drop for both >>>>>>>>>>>>>>>>> mrg=on and mrg=off (--txqflags=0xf00, non-vector >>> version), >>>>>>>>>>>>>>>>> both iofwd and macfwd. >>>>>>>>>>>>>>> I tested PVP (with macswap on guest) and Txonly/Rxonly on >>> an >>>>>> Ivy >>>>>>>>> Bridge >>>>>>>>>>>>>>> platform, and didn't faced such a drop. >>>>>>>>>>>>> >>>>>>>>>>>>> I was actually wondering that may be the cause. I tested it >>>>>>>>>>>>> with >>>>>>>>>>>>> my IvyBridge server as well, I saw no drop. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe you should find a similar platform (Haswell) and have a >>>>>>>>>>>>> try? >>>>>>>>>>> Yes, that's why I asked Zhihong whether he could test Txonly in >>>>>>>>>>> guest >>>>>> to >>>>>>>>>>> see if issue is reproducible like this. >>>>>>>>> >>>>>>>>> I have no Haswell box, otherwise I could do a quick test for you. >>>>>>>>> IIRC, >>>>>>>>> he tried to disable the indirect_desc feature, then the >>>>>>>>> performance >>>>>>>>> recovered. So, it's likely the indirect_desc is the culprit here. >>>>>>>>> >>>>>>>>>>> I will be easier for me to find an Haswell machine if it has not >>>>>>>>>>> to be >>>>>>>>>>> connected back to back to and HW/SW packet generator. >>>>>>> In fact simple loopback test will also do, without pktgen. >>>>>>> >>>>>>> Start testpmd in both host and guest, and do "start" in one >>>>>>> and "start tx_first 32" in another. >>>>>>> >>>>>>> Perf drop is about 24% in my test. >>>>>>> >>>>>> >>>>>> Thanks, I never tried this test. >>>>>> I managed to find an Haswell platform (Intel(R) Xeon(R) CPU >>>>>> E5-2699 v3 >>>>>> @ 2.30GHz), and can reproduce the problem with the loop test you >>>>>> mention. I see a performance drop about 10% (8.94Mpps/8.08Mpps). >>>>>> Out of curiosity, what are the numbers you get with your setup? >>>>> >>>>> Hi Maxime, >>>>> >>>>> Let's align our test case to RC2, mrg=on, loopback, on Haswell. >>>>> My results below: >>>>> 1. indirect=1: 5.26 Mpps >>>>> 2. indirect=0: 6.54 Mpps >>>>> >>>>> It's about 24% drop. >>>> OK, so on my side, same setup on Haswell: >>>> 1. indirect=1: 7.44 Mpps >>>> 2. indirect=0: 8.18 Mpps >>>> >>>> Still 10% drop in my case with mrg=on. >>>> >>>> The strange thing with both of our figures is that this is below from >>>> what I obtain with my SandyBridge machine. The SB cpu freq is 4% >>>> higher, >>>> but that doesn't explain the gap between the measurements. >>>> >>>> I'm continuing the investigations on my side. >>>> Maybe we should fix a deadline, and decide do disable indirect in >>>> Virtio PMD if root cause not identified/fixed at some point? >>>> >>>> Yuanhan, what do you think? >>> >>> I have done some measurements using perf, and know understand better >>> what happens. >>> >>> With indirect descriptors, I can see a cache miss when fetching the >>> descriptors in the indirect table. Actually, this is expected, so >>> we prefetch the first desc as soon as possible, but still not soon >>> enough to make it transparent. >>> In direct descriptors case, the desc in the virtqueue seems to be >>> remain in the cache from its previous use, so we have a hit. >>> >>> That said, in realistic use-case, I think we should not have a hit, >>> even with direct descriptors. >>> Indeed, the test case use testpmd on guest side with the forwarding set >>> in IO mode. It means the packet content is never accessed by the guest. >>> >>> In my experiments, I am used to set the "macswap" forwarding mode, which >>> swaps src and dest MAC addresses in the packet. I find it more >>> realistic, because I don't see the point in sending packets to the guest >>> if it is not accessed (not even its header). >>> >>> I tried again the test case, this time with setting the forwarding mode >>> to macswap in the guest. This time, I get same performance with both >>> direct and indirect (indirect even a little better with a small >>> optimization, consisting in prefetching the 2 first descs >>> systematically as we know there are contiguous). >> >> >> Hi Maxime, >> >> I did a little more macswap test and found out more stuff here: > Thanks for doing more tests. > >> >> 1. I did loopback test on another HSW machine with the same H/W, >> and indirect_desc on and off seems have close perf >> >> 2. So I checked the gcc version: >> >> * Previous: gcc version 6.2.1 20160916 (Fedora 24) >> >> * New: gcc version 5.4.0 20160609 (Ubuntu 16.04.1 LTS) > > On my side, I tested with RHEL7.3: > - gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11) > > It certainly contains some backports from newer GCC versions. > >> >> On previous one indirect_desc has 20% drop >> >> 3. Then I compiled binary on Ubuntu and scp to Fedora, and as >> expected I got the same perf as on Ubuntu, and the perf gap >> disappeared, so gcc is definitely one factor here >> >> 4. Then I use the Ubuntu binary on Fedora for PVP test, then the >> perf gap comes back again and the same with the Fedora binary >> results, indirect_desc causes about 20% drop > > Let me know if I understand correctly: > Loopback test with macswap: > - gcc version 6.2.1 : 20% perf drop > - gcc version 5.4.0 : No drop > > PVP test with macswap: > - gcc version 6.2.1 : 20% perf drop > - gcc version 5.4.0 : 20% perf drop I forgot to ask, did you recompile only host, or both host and guest testmpd's in your test? > >> >> So in all, could you try PVP traffic on HSW to see how it works? > Sadly, the HSW machine I borrowed does not have other device connected > back to back on its 10G port. I can only test PVP with SNB machines > currently. > >> >> >>> >>> Do you agree we should assume that the packet (header or/and buf) will >>> always be accessed by the guest application? >>> If so, do you agree we should keep indirect descs enabled, and maybe >>> update the test cases? >> >> >> I agree with you that mac/macswap test is more realistic and makes >> more sense for real applications. > > Thanks, > Maxime