From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id C688637B3 for ; Tue, 23 Aug 2016 04:15:48 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP; 22 Aug 2016 19:15:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,563,1464678000"; d="scan'208";a="869655984" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga003.jf.intel.com with ESMTP; 22 Aug 2016 19:15:47 -0700 Received: from fmsmsx154.amr.corp.intel.com (10.18.116.70) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Aug 2016 19:15:47 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by FMSMSX154.amr.corp.intel.com (10.18.116.70) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Aug 2016 19:15:46 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.181]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.194]) with mapi id 14.03.0301.000; Tue, 23 Aug 2016 10:15:44 +0800 From: "Wang, Zhihong" To: Maxime Coquelin , "dev@dpdk.org" CC: "yuanhan.liu@linux.intel.com" Thread-Topic: [PATCH v3 0/5] vhost: optimize enqueue Thread-Index: AQHR+hh5PoNiMS5qakKA1d8du6AZMqBUHzGAgAGmEiA= Date: Tue, 23 Aug 2016 02:15:43 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE09411077345D@SHSMSX103.ccr.corp.intel.com> References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <1471585430-125925-1-git-send-email-zhihong.wang@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYjNkZTBhYTktMWQ2MS00YjIxLThkN2QtNWQwZGI2ZGE0MDg2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjFMTlg3RjRuSzZMeTA0dENYaDE1YmJZamZcLzlVTko4N3BXbUJnWTFKVjY4PSJ9 x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2016 02:15:49 -0000 > Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue >=20 > Hi Zhihong, >=20 [...] > > The main optimization techniques are: > > > > 1. Reorder code to reduce CPU pipeline stall cycles. > > > > 2. Batch update the used ring for better efficiency. > > > > 3. Prefetch descriptor to hide cache latency. > > > > 4. Remove useless volatile attribute to allow compiler optimization. >=20 > Thanks for these details, this is helpful to understand where the perf > gain comes from. > I would suggest to add these information as comments in the code > where/if it makes sense. If more a general comment, at least add it in > the commit message of the patch introducing it. > Indeed, adding it to the cover letter is fine, but the information is > lost as soon as the series is applied. Hi Maxime, I did add these info in the later optimization patches to explain each optimization techniques. The v1 was indeed hard to read. >=20 > You don't mention any figures, so I set up a benchmark on my side to > evaluate your series. It indeed shows an interesting performance gain. >=20 > My setup consists of one host running a guest. > The guest generates as much 64bytes packets as possible using > pktgen-dpdk. The hosts forwards received packets back to the guest > using testpmd on vhost pmd interface. Guest's vCPUs are pinned to > physical CPUs. >=20 Thanks for doing the test! I didn't publish any numbers since the gain varies in different platforms and test setups. In my phy to vm test on both IVB and HSW, where testpmd in the host rx from the nic and enqueue to the guest, the enqueue efficiency (cycles per packet= ) is 2.4x and 1.4x as fast as the current code for mergeable on and mergeable off respectively, for v3 patch. > I tested it with and without your v1 patch, with and without > rx-mergeable feature turned ON. > Results are the average of 8 runs of 60 seconds: >=20 > Rx-Mergeable ON : 7.72Mpps > Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps > Rx-Mergeable OFF: 10.52Mpps > Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps >=20 > Regards, > Maxime