From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 80DAF5A9D for ; Thu, 22 Jan 2015 16:22:29 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 22 Jan 2015 07:17:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,449,1418112000"; d="scan'208";a="516065699" Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.25]) by orsmga003.jf.intel.com with SMTP; 22 Jan 2015 07:15:13 -0800 Received: by (sSMTP sendmail emulation); Thu, 22 Jan 2015 15:21:57 +0025 Date: Thu, 22 Jan 2015 15:21:57 +0000 From: Bruce Richardson To: Linhaifeng Message-ID: <20150122152157.GF4580@bricha3-MOBL3> References: <54C070DF.1050006@huawei.com> <20150122044531.GA13230@mhcomputing.net> <54C08B54.50700@huawei.com> <20150122073526.GA14800@mhcomputing.net> <54C0CFB5.909@igel.co.jp> <20150122113426.GC4580@bricha3-MOBL3> <54C0F2B9.7050006@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54C0F2B9.7050006@huawei.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] some questions about rte_memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 15:22:30 -0000 On Thu, Jan 22, 2015 at 08:53:13PM +0800, Linhaifeng wrote: > > > On 2015/1/22 19:34, Bruce Richardson wrote: > > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote: > >> On 2015/01/22 16:35, Matthew Hall wrote: > >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote: > >>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why? > >>> No guarantee. But a theory. It might use some things from the EAL init to > >>> figure out which version of the accelerated algorithm to use. > >> > >> This selection is done at compile-time. > >> And if the size is constant, I guess DPDK assumes memcpy is replaced by > >> inline __builtin_memcpy. > >> I haven't checked the performance of builtin memcpy, but probably much > >> faster. > >> > > > > Yes, that assumption is correct. A couple of years ago we discovered that for > > constant size values, the compiler would generate much faster code for us > > using a regular memcpy than rte_memcpy, hence the macro. > > > > /Bruce > > > >> Tetsuya > >> > >>> Matthew. > >> > >> > > > > > > Hi,Bruce > > I test it,most results like you said use constant may be faster,but sometimes not. > > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279893712 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277818600 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279264328 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277667116 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999 > rte_memcpy(constant) used:279491832 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277622772 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279402156 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277738464 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279305172 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277483004 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999 > rte_memcpy(constant) used:279784124 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:277605332 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322817260 > rte_memcpy(variable) used:350333864 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322840748 > rte_memcpy(variable) used:350297868 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999 > rte_memcpy(constant) used:322488240 > rte_memcpy(variable) used:350348652 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:322021428 > rte_memcpy(variable) used:350416440 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:321370900 > rte_memcpy(variable) used:350355796 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999 > rte_memcpy(constant) used:322704552 > rte_memcpy(variable) used:349900832 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:422705828 > rte_memcpy(variable) used:425493328 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:422421840 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:413691412 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999 > rte_memcpy(constant) used:425233088 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:421136724 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:901014608 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:900997388 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:900803308 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:900794076 > linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999 > rte_memcpy(constant) used:901842436 @@@@@@@@@@@@@@ not faster > rte_memcpy(variable) used:901218984 > linux-mnSyvH:/mnt/sdb/linhf/test # > > > > here is my test codes: > > #include > #include > #include > > > int main(int narg, char** args) > { > int i; > char buf[1024]; > uint64_t start, end; > > if (narg < 3) { > printf("usage:./rte_memcpy_test size times\n"); > return 0; > } > > size_t size_v = atoi(args[1]); > const size_t size_c = atoi(args[1]); This (size_c) is a run-time constant, not a compile-time constant. To trigger the memcpy optimizations inside the compiler, the size value must be constant at compile time. Regards, /Bruce > int times = atoi(args[2]); > > start = rte_rdtsc(); > for(i = 0; i < times; i++) { > rte_memcpy(buf, buf, size_c); > } > end = rte_rdtsc(); > printf("rte_memcpy(constant) used:%llu\n", end - start); > > start = rte_rdtsc(); > for (i = 0; i < times; i++) { > rte_memcpy(buf, buf, size_v); > } > end = rte_rdtsc(); > printf("rte_memcpy(variable) used:%llu\n", end - start); > > return 0; > } > > > > > > -- > Regards, > Haifeng >