From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wes1-so2.wedos.net (wes1-so2-b.wedos.net [46.28.106.45]) by dpdk.org (Postfix) with ESMTP id DCE53558D for ; Thu, 12 Jan 2017 16:05:42 +0100 (CET) Received: from pcviktorin.fit.vutbr.cz (dhcpz185.fit.vutbr.cz [147.229.14.185]) by wes1-so2.wedos.net (Postfix) with ESMTPSA id 3tzpwk2t9Zz4kr; Thu, 12 Jan 2017 16:05:42 +0100 (CET) Date: Thu, 12 Jan 2017 16:02:56 +0100 From: Jan Viktorin To: Yuanhan Liu Cc: Thomas Monjalon , Jianbo Liu , Jerin Jacob , Chao Zhu , dev@dpdk.org, Tan Jianfeng , Wang Zhihong , Olivier Matz , Maxime Coquelin , "Michael S. Tsirkin" , =?UTF-8?B?T3Jzw6Fr?= Michal Message-ID: <20170112160256.6915ff12.viktorin@rehivetech.com> In-Reply-To: <20170112023058.GF2402@yliu-dev.sh.intel.com> References: <1484108832-19907-1-git-send-email-yuanhan.liu@linux.intel.com> <1484108832-19907-2-git-send-email-yuanhan.liu@linux.intel.com> <1610499.AMUobBPor6@xps13> <20170112023058.GF2402@yliu-dev.sh.intel.com> Organization: RehiveTech MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2017 15:05:43 -0000 On Thu, 12 Jan 2017 10:30:58 +0800 Yuanhan Liu wrote: > On Wed, Jan 11, 2017 at 03:51:22PM +0100, Thomas Monjalon wrote: > > 2017-01-11 12:27, Yuanhan Liu: > > > The fact that virtio net header is initiated to zero in PMD driver > > > init stage means that these costly writes are unnecessary and could > > > be avoided: > > > > > > if (hdr->csum_start != 0) > > > hdr->csum_start = 0; > > > > > > And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the > > > performance drop introduced by TSO enabling is recovered: it could > > > be up to 20% in micro benchmarking. > > > > This patch is adding a condition to assignments. > > We need a benchmark on other architectures like ARM. Please anyone? > > I think the cost of condition should be way lower than the cost from the > penalty introduced by the cache issue, that I don't see it would perform > bad on other platforms. > > But, of course, testing is always welcome! > > --yliu Hello, we've done a synthetic measurement, principle briefly: == Without condition check == start = gettimeofday(); for (i = 0; i < 1024*1024*128; ++i) { hdr->csum_start = 0; hdr->csum_offset = 0; hdr->flags = 0; } end = gettimeofday(); == With condition check == start = gettimeofday(); for (i = 0; i < 1024*1024*128; ++i) { ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0); ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0); ASSIGN_UNLESS_EQUAL(hdr->flags, 0); } end = gettimeofday(); == Results == Computed as total time of all threads: for i = 1..THREAD_COUNT: result += end[i] - start[i] cpu threads without-check (ms) with-check Xeon E5-2670 1 516 529 Xeon E5-2670 2 1155 953 Xeon E5-2670 8 8947 5044 Xeon E5-2670 16 23335 16836 Zynq-7020 (armv7) 1 6735 7205 Zynq-7020 (armv7) 2 13753 14418 The advantage for Intel is evident when increasing the number of threads. However, on 32-bit ARMs we might expect some performance drop. Regards Jan > > > > > > [...] > > > +/* avoid write operation when necessary, to lessen cache issues */ > > > +#define ASSIGN_UNLESS_EQUAL(var, val) do { \ > > > + if ((var) != (val)) \ > > > + (var) = (val); \ > > > +} while (0)