From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 80DAF5A9D
 for <dev@dpdk.org>; Thu, 22 Jan 2015 16:22:29 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga103.jf.intel.com with ESMTP; 22 Jan 2015 07:17:58 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,449,1418112000"; d="scan'208";a="516065699"
Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.25])
 by orsmga003.jf.intel.com with SMTP; 22 Jan 2015 07:15:13 -0800
Received: by  (sSMTP sendmail emulation); Thu, 22 Jan 2015 15:21:57 +0025
Date: Thu, 22 Jan 2015 15:21:57 +0000
From: Bruce Richardson <bruce.richardson@intel.com>
To: Linhaifeng <haifeng.lin@huawei.com>
Message-ID: <20150122152157.GF4580@bricha3-MOBL3>
References: <54C070DF.1050006@huawei.com>
 <20150122044531.GA13230@mhcomputing.net>
 <54C08B54.50700@huawei.com>
 <20150122073526.GA14800@mhcomputing.net> <54C0CFB5.909@igel.co.jp>
 <20150122113426.GC4580@bricha3-MOBL3> <54C0F2B9.7050006@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <54C0F2B9.7050006@huawei.com>
Organization: Intel Shannon Ltd.
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] some questions about  rte_memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Jan 2015 15:22:30 -0000

On Thu, Jan 22, 2015 at 08:53:13PM +0800, Linhaifeng wrote:
> 
> 
> On 2015/1/22 19:34, Bruce Richardson wrote:
> > On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/01/22 16:35, Matthew Hall wrote:
> >>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> >>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
> >>> No guarantee. But a theory. It might use some things from the EAL init to 
> >>> figure out which version of the accelerated algorithm to use.
> >>
> >> This selection is done at compile-time.
> >> And if the size is constant, I guess DPDK assumes memcpy is replaced by
> >> inline __builtin_memcpy.
> >> I haven't checked the performance of builtin memcpy, but probably much
> >> faster.
> >>
> > 
> > Yes, that assumption is correct. A couple of years ago we discovered that for
> > constant size values, the compiler would generate much faster code for us
> > using a regular memcpy than rte_memcpy, hence the macro.
> > 
> > /Bruce
> > 
> >> Tetsuya
> >>
> >>> Matthew.
> >>
> >>
> > 
> > 
> 
> Hi,Bruce
> 
> I test it,most results like you said use constant may be faster,but sometimes not.
> 
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279893712	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277818600
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279264328	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277667116
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 9999999
> rte_memcpy(constant) used:279491832	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277622772
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279402156	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277738464
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279305172	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277483004
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 9999999
> rte_memcpy(constant) used:279784124	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:277605332
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322817260
> rte_memcpy(variable) used:350333864
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322840748
> rte_memcpy(variable) used:350297868
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 9999999
> rte_memcpy(constant) used:322488240
> rte_memcpy(variable) used:350348652
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:322021428
> rte_memcpy(variable) used:350416440
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:321370900
> rte_memcpy(variable) used:350355796
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 9999999
> rte_memcpy(constant) used:322704552
> rte_memcpy(variable) used:349900832
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:422705828
> rte_memcpy(variable) used:425493328
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:422421840	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:413691412
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 9999999
> rte_memcpy(constant) used:425233088	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:421136724
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:901014608	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:900997388
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:900803308	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:900794076
> linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 9999999
> rte_memcpy(constant) used:901842436	@@@@@@@@@@@@@@ not faster
> rte_memcpy(variable) used:901218984
> linux-mnSyvH:/mnt/sdb/linhf/test #
> 
> 
> 
> here is my test codes:
> 
> #include <stdio.h>
> #include <rte_memcpy.h>
> #include <rte_cycles.h>
> 
> 
> int main(int narg, char** args)
> {
>         int i;
>         char buf[1024];
>         uint64_t start, end;
> 
>         if (narg < 3) {
>                 printf("usage:./rte_memcpy_test size times\n");
>                 return 0;
>         }
> 
>         size_t size_v = atoi(args[1]);
>         const size_t size_c = atoi(args[1]);

This (size_c) is a run-time constant, not a compile-time constant. To trigger the
memcpy optimizations inside the compiler, the size value must be constant at
compile time.

Regards,
/Bruce

>         int times = atoi(args[2]);
> 
>         start = rte_rdtsc();
>         for(i = 0; i < times; i++) {
>                 rte_memcpy(buf, buf, size_c);
>         }
>         end = rte_rdtsc();
>         printf("rte_memcpy(constant) used:%llu\n", end - start);
> 
>         start = rte_rdtsc();
>         for (i = 0; i < times; i++) {
>                 rte_memcpy(buf, buf, size_v);
>         }
>         end = rte_rdtsc();
>         printf("rte_memcpy(variable) used:%llu\n", end - start);
> 
>         return 0;
> }
> 
> 
> 
> 
> 
> -- 
> Regards,
> Haifeng
>