From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 2BC402E41 for ; Mon, 10 Oct 2016 04:43:29 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP; 09 Oct 2016 19:43:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,470,1473145200"; d="scan'208";a="18339391" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.67.162]) by orsmga004.jf.intel.com with ESMTP; 09 Oct 2016 19:43:26 -0700 Date: Mon, 10 Oct 2016 10:44:28 +0800 From: Yuanhan Liu To: "Wang, Zhihong" Cc: Jianbo Liu , Maxime Coquelin , "dev@dpdk.org" Message-ID: <20161010024428.GT1597@yliu-dev.sh.intel.com> References: <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com> <20160922022903.GJ23158@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B5DAE@SHSMSX103.ccr.corp.intel.com> <20160927102123.GL25823@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B7C0B@SHSMSX103.ccr.corp.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Oct 2016 02:43:29 -0000 On Sun, Oct 09, 2016 at 12:09:07PM +0000, Wang, Zhihong wrote: > > > > Tested with testpmd, host: txonly, guest: rxonly > > > > size (bytes) improvement (%) > > > > 64 4.12 > > > > 128 6 > > > > 256 2.65 > > > > 512 -1.12 > > > > 1024 -7.02 > > > > > > There is a difference between Zhihong's code and the old I spotted in > > > the first time: Zhihong removed the avail_idx prefetch. I understand > > > the prefetch becomes a bit tricky when mrg-rx code path is considered; > > > thus, I didn't comment on that. > > > > > > That's one of the difference that, IMO, could drop a regression. I then > > > finally got a chance to add it back. > > > > > > A rough test shows it improves the performance of 1400B packet size > > greatly > > > in the "txonly in host and rxonly in guest" case: +33% is the number I get > > > with my test server (Ivybridge). > > > > Thanks Yuanhan! I'll validate this on x86. > > Hi Yuanhan, > > Seems your code doesn't perform correctly. I write a new version > of avail idx prefetch but didn't see any perf benefit. > > To be honest I doubt the benefit of this idea. The previous mrg_off > code has this method but doesn't give any benefits. Good point. I thought of that before, too. But you know that I made it in rush, that I didn't think further and test more. I looked the code a bit closer this time, and spotted a bug: the prefetch actually didn't happen, due to following code piece: if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) { prefetch_avail_idx(vq); ... } Since vq->next_avail_idx is set to 0 at the entrance of enqueue path, prefetch_avail_idx() will be called. The fix is easy though: just put prefetch_avail_idx before invoking enqueue_packet. In summary, Zhihong is right, I see no more gains with that fix :( However, as stated, that's kind of the only difference I found between yours and the old code, that maybe it's still worthwhile to have a test on ARM, Jianbo? --yliu > Even if this is useful, the benefits should be more significant for > small packets, it's unlikely this simple idx prefetch could bring > over 30% perf gain for large packets like 1400B ones. > > But if you really do work it out like that I'll be very glad to see. > > Thanks > Zhihong > > > > > > > > > I guess this might/would help your case as well. Mind to have a test > > > and tell me the results? > > > > > > BTW, I made it in rush; I haven't tested the mrg-rx code path yet. > > > > > > Thanks. > > > > > > --yliu