From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 602A856AB for ; Mon, 31 Oct 2016 11:01:24 +0100 (CET) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP; 31 Oct 2016 03:01:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,426,1473145200"; d="scan'208";a="25618086" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by fmsmga006.fm.intel.com with ESMTP; 31 Oct 2016 03:01:23 -0700 Received: from fmsmsx111.amr.corp.intel.com (10.18.116.5) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 31 Oct 2016 03:01:22 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx111.amr.corp.intel.com (10.18.116.5) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 31 Oct 2016 03:01:22 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.139]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.96]) with mapi id 14.03.0248.002; Mon, 31 Oct 2016 18:01:18 +0800 From: "Wang, Zhihong" To: Maxime Coquelin , Yuanhan Liu Thread-Topic: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path Thread-Index: AQHSJjLIgzIjeGrkS0+qEKh9oEoL36Cr/zyAgAAg9YCAAA7sgIAP4USA//+BDwCAABccgIAAAIiAgAADH4CAAXEIoP//7duAgAVhEfA= Date: Mon, 31 Oct 2016 10:01:18 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE09414E7DA533@SHSMSX103.ccr.corp.intel.com> References: <1474965769-24782-1-git-send-email-maxime.coquelin@redhat.com> <8F6C2BD409508844A0EFC19955BE09414E7CE6D1@SHSMSX103.ccr.corp.intel.com> <70cc3b89-d680-1519-add3-f38b228e65b5@redhat.com> <20161017132121.GG16751@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7D8BDF@SHSMSX103.ccr.corp.intel.com> <20161027103317.GM16751@yliu-dev.sh.intel.com> <0ba8f8c9-2174-b3c1-4f07-f6911bffa6cd@redhat.com> <20161027104621.GN16751@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7D90C7@SHSMSX103.ccr.corp.intel.com> <88169067-290d-a7bb-ab2c-c9b8ec1b1ded@redhat.com> In-Reply-To: <88169067-290d-a7bb-ab2c-c9b8ec1b1ded@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "mst@redhat.com" , "dev@dpdk.org" , "vkaplans@redhat.com" Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Oct 2016 10:01:25 -0000 > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Friday, October 28, 2016 3:42 PM > To: Wang, Zhihong ; Yuanhan Liu > > Cc: stephen@networkplumber.org; Pierre Pfister (ppfister) > ; Xie, Huawei ; dev@dpdk.org; > vkaplans@redhat.com; mst@redhat.com > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors suppor= t > to the TX path >=20 >=20 >=20 > On 10/28/2016 02:49 AM, Wang, Zhihong wrote: > > > >> > -----Original Message----- > >> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] > >> > Sent: Thursday, October 27, 2016 6:46 PM > >> > To: Maxime Coquelin > >> > Cc: Wang, Zhihong ; > >> > stephen@networkplumber.org; Pierre Pfister (ppfister) > >> > ; Xie, Huawei ; > dev@dpdk.org; > >> > vkaplans@redhat.com; mst@redhat.com > >> > Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors > support > >> > to the TX path > >> > > >> > On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin wrote: > >>> > > > >>> > > > >>> > > On 10/27/2016 12:33 PM, Yuanhan Liu wrote: > >>>> > > >On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime Coquelin > wrote: > >>>>> > > >>Hi Zhihong, > >>>>> > > >> > >>>>> > > >>On 10/27/2016 11:00 AM, Wang, Zhihong wrote: > >>>>>> > > >>>Hi Maxime, > >>>>>> > > >>> > >>>>>> > > >>>Seems indirect desc feature is causing serious performance > >>>>>> > > >>>degradation on Haswell platform, about 20% drop for both > >>>>>> > > >>>mrg=3Don and mrg=3Doff (--txqflags=3D0xf00, non-vector vers= ion), > >>>>>> > > >>>both iofwd and macfwd. > >>>>> > > >>I tested PVP (with macswap on guest) and Txonly/Rxonly on an > Ivy > >> > Bridge > >>>>> > > >>platform, and didn't faced such a drop. > >>>> > > > > >>>> > > >I was actually wondering that may be the cause. I tested it wit= h > >>>> > > >my IvyBridge server as well, I saw no drop. > >>>> > > > > >>>> > > >Maybe you should find a similar platform (Haswell) and have a t= ry? > >>> > > Yes, that's why I asked Zhihong whether he could test Txonly in g= uest > to > >>> > > see if issue is reproducible like this. > >> > > >> > I have no Haswell box, otherwise I could do a quick test for you. II= RC, > >> > he tried to disable the indirect_desc feature, then the performance > >> > recovered. So, it's likely the indirect_desc is the culprit here. > >> > > >>> > > I will be easier for me to find an Haswell machine if it has not = to be > >>> > > connected back to back to and HW/SW packet generator. > > In fact simple loopback test will also do, without pktgen. > > > > Start testpmd in both host and guest, and do "start" in one > > and "start tx_first 32" in another. > > > > Perf drop is about 24% in my test. > > >=20 > Thanks, I never tried this test. > I managed to find an Haswell platform (Intel(R) Xeon(R) CPU E5-2699 v3 > @ 2.30GHz), and can reproduce the problem with the loop test you > mention. I see a performance drop about 10% (8.94Mpps/8.08Mpps). > Out of curiosity, what are the numbers you get with your setup? Hi Maxime, Let's align our test case to RC2, mrg=3Don, loopback, on Haswell. My results below: 1. indirect=3D1: 5.26 Mpps 2. indirect=3D0: 6.54 Mpps It's about 24% drop. >=20 > As I never tried this test, I run it again on my Sandy Bridge setup, and > I also see a performance regression, this time of 4%. >=20 > If I understand correctly the test, only 32 packets are allocated, > corresponding to a single burst, which is less than the queue size. > So it makes sense that the performance is lower with this test case. Actually it's 32 burst, so 1024 packets in total, enough to fill the queue. Thanks Zhihong >=20 > Thanks, > Maxime