From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 571805A1F for ; Wed, 21 Jan 2015 12:40:33 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP; 21 Jan 2015 03:34:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,441,1418112000"; d="scan'208";a="673378922" Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.28]) by orsmga002.jf.intel.com with SMTP; 21 Jan 2015 03:40:23 -0800 Received: by (sSMTP sendmail emulation); Wed, 21 Jan 2015 11:40:22 +0025 Date: Wed, 21 Jan 2015 11:40:22 +0000 From: Bruce Richardson To: "Wang, Zhihong" Message-ID: <20150121114022.GA10756@bricha3-MOBL3> References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <20150119130221.GB21790@hmsreliant.think-freely.org> <20150120151118.GD18449@hmsreliant.think-freely.org> <20150120161453.GA5316@bricha3-MOBL3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2015 11:40:33 -0000 On Wed, Jan 21, 2015 at 03:44:23AM +0000, Wang, Zhihong wrote: > Neil, Bruce, > > Some data first. > > Sandy Bridge without AVX2: > 1. original w/ 10 constant memcpy: 2'25" > 2. patch w/ 12 constant memcpy: 2'41" > 3. patch w/ 63 constant memcpy: 9'41" > > Haswell with AVX2: > 1. original w/ 10 constant memcpy: 1'57" > 2. patch w/ 12 constant memcpy: 1'56" > 3. patch w/ 63 constant memcpy: 3'16" > > Also, to address Bruce's question, we have to reduce test case to cut down compile time. Because we use: > 1. intrinsics instead of assembly for better flexibility and can utilize more compiler optimization > 2. complex function body for better performance > 3. inlining > This increases compile time. > But I think it'd be okay to do that as long as we can select a fair set of test points. > > It'd be great if you could give some suggestion, say, 12 points. > > Zhihong (John) > Hi Zhihong, Just for comparison I've done a clean dpdk compile on my SNB system this morning. Using parallel make (which is pretty normal I suspect), I get the following numbers: real 0m52.549s user 0m36.034s sys 0m10.014s So total compile time is 52 seconds. Running a make uninstall and then make install again with "-j 1", provides the following numbers: real 0m32.751s user 0m16.041s sys 0m7.946s Obviously, caching effects are being completely ignored by the this unscientific study (rerunning the first test again gives a 13-second time), but the upshot is that the compile time for DPDK right now is well under a minute in the normal case. Adding in a new file that, in the best case, takes two minutes to compile is going to increase our compile time many times over. Regards, /Bruce