From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 0BF9D374E for ; Thu, 13 Oct 2016 03:21:27 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP; 12 Oct 2016 18:21:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,338,1473145200"; d="scan'208";a="179280205" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by fmsmga004.fm.intel.com with ESMTP; 12 Oct 2016 18:21:24 -0700 Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 12 Oct 2016 18:21:24 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by fmsmsx156.amr.corp.intel.com (10.18.116.74) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 12 Oct 2016 18:21:23 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.139]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.209]) with mapi id 14.03.0248.002; Thu, 13 Oct 2016 09:21:22 +0800 From: "Wang, Zhihong" To: Thomas Monjalon CC: Yuanhan Liu , Jianbo Liu , Maxime Coquelin , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue Thread-Index: AQHR+hh5PoNiMS5qakKA1d8du6AZMqBUHzGAgC8wyICAAI0Q0P//tyiAgADjrICAADeEgIAfPdaAgAEeycD//7UFAIABJ3cA Date: Thu, 13 Oct 2016 01:21:21 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE09414E7CDB34@SHSMSX103.ccr.corp.intel.com> References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <20161012025307.GG16751@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7CD78E@SHSMSX103.ccr.corp.intel.com> <1707408.heiIBWjog8@xps13> In-Reply-To: <1707408.heiIBWjog8@xps13> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Oct 2016 01:21:28 -0000 > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] > Sent: Wednesday, October 12, 2016 11:31 PM > To: Wang, Zhihong > Cc: Yuanhan Liu ; Jianbo Liu > ; Maxime Coquelin ; > dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue >=20 > Sorry guys, you lost me in the discussion. >=20 > Is there some regression only on ARM? ARM is what we see, no info on ppc yet. > Does it need some work specifically on memcpy for ARM, > or vhost for ARM? > Who can work on ARM optimization? These are still open questions, Jiaobo who reported this doesn't have time for more testing now according to the reply. I'm trying to do some test in the hope to identify the root cause. >=20 > More comments below. >=20 > 2016-10-12 12:22, Wang, Zhihong: > > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] > > > > It's an ARM server. > > > > > > > > >> > 3. How many percentage of drop are you seeing? > > > > The testing result: > > > > size (bytes) improvement (%) > > > > 64 3.92 > > > > 128 11.51 > > > > 256 24.16 > > > > 512 -13.79 > > > > 1024 -22.51 > > > > 1500 -12.22 > > > > A correction is that performance is dropping if byte size is larger= than > 512. > > > > > > I have thought of this twice. Unfortunately, I think I may need NACK = this > > > series. > > > > > > Merging two code path into one is really good: as you stated, it impr= oves > > > the maintainability. But only if we see no performance regression on = both > > > path after the refactor. Unfortunately, that's not the case here: it = hurts > > > the performance for one code path (non-mrg Rx). >=20 > +1 >=20 > > > That makes me think we may should not do the code path merge at all. = I > think > > > that also aligns with what you have said before (internally): we coul= d do > the > > > merge if it gives comparable performance before and after that. > > > > > > Besides that, I don't quite like the way you did in patch 2 (rewrite > enqueue): > > > you made a lot of changes in one patch. That means if something wrong > > > happened, > > > it is hard to narrow down which change introduces that regression. Ba= dly, > > > that's exactly what we met here. Weeks have been passed, I see no > progress. >=20 > +1, it is important to have simple patches making changes step by step. >=20 > > > That's the reason we like the idea of "one patch only does one thing,= an > > > atomic thing". > > > > > > Yuanhan, folks, > > > > Thanks for the analysis. I disagree here though. > > > > I analyze, develop, benchmark on x86 platforms, where this patch > > works great. > > > > I've been trying to analyze on ARM too but it takes time and I've > > had a schedule. Also since the ARM perf issue comes when it's > > v6 already, I might not be able to make it in time. However > > that's what I have to do for this patch to be merged in this > > or the next release. > > > > In the meantime, may I suggest we consider the possibility to > > have dedicated codes for **perf critical paths** for different > > kinds of architecture? >=20 > Yes that's what we do in several parts of DPDK. >=20 > > It can be hard for a person to have both the knowledge and the > > development environment for multiple archs at the same time. >=20 > Yes we do not expect you work on ARM. > So if nobody work on the ARM issue, you could make 2 code paths > in order to allow your optimization for x86 only. > But that's not the preferred way. > And you must split your rework to better identify which part is > a regression on ARM. >=20 > > Moreover, different optimization techniques might be required for > > different archs, so it's hard and unnecessary to make a function > > works for all archs, sometimes it's just not the right thing to do. >=20 > Yes sometimes. Please help us to be convinced for this case. >=20 > > > So I will apply the first patch (it's a bug fixing patch) and ask you= to > > > refactor the rest, without the code path merge. > > > > > > I think we could still have a good maintainability code base if we in= troduce > > > more common helper functions that can be used on both Rx path, or > even on > > > Tx path (such as update_used_ring, or shadow_used_ring). >=20 > Yes it is a good step. > And the code path merge could be reconsidered later. >=20 > > > It's a bit late for too many changes for v16.11. I think you could ju= st > > > grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code pa= th, > > > if that also helps the performance? Let us handle the left in next re= lease, > > > such as shadow used ring. >=20 > Thank you