From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <haifeng.lin@huawei.com>
Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65])
 by dpdk.org (Postfix) with ESMTP id C406D5AB5
 for <dev@dpdk.org>; Fri, 23 Jan 2015 03:58:29 +0100 (CET)
Received: from 172.24.2.119 (EHLO szxeml427-hub.china.huawei.com)
 ([172.24.2.119])
 by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued)
 with ESMTP id CGD91747; Fri, 23 Jan 2015 10:58:24 +0800 (CST)
Received: from [127.0.0.1] (10.177.19.115) by szxeml427-hub.china.huawei.com
 (10.82.67.182) with Microsoft SMTP Server id 14.3.158.1; Fri, 23 Jan 2015
 10:58:21 +0800
Message-ID: <54C1B8C9.5020201@huawei.com>
Date: Fri, 23 Jan 2015 10:58:17 +0800
From: Linhaifeng <haifeng.lin@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.0
MIME-Version: 1.0
To: Bruce Richardson <bruce.richardson@intel.com>
References: <54C070DF.1050006@huawei.com>
 <20150122044531.GA13230@mhcomputing.net> <54C08B54.50700@huawei.com>
 <20150122073526.GA14800@mhcomputing.net> <54C0CFB5.909@igel.co.jp>
 <20150122113426.GC4580@bricha3-MOBL3> <54C0F2B9.7050006@huawei.com>
 <20150122152157.GF4580@bricha3-MOBL3>
In-Reply-To: <20150122152157.GF4580@bricha3-MOBL3>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.177.19.115]
X-CFilter-Loop: Reflected
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] some questions about  rte_memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Jan 2015 02:58:31 -0000



On 2015/1/22 23:21, Bruce Richardson wrote:
> This (size_c) is a run-time constant, not a compile-time constant. To trigger the
> memcpy optimizations inside the compiler, the size value must be constant at
> compile time.


Hi, Bruce

You are right. When use compile-time constant memcpy is faster.Thank you for all.

Here is my test result:

rte_memcpy(constant) size:8 time:876
rte_memcpy(variable) size:8 time:2824
rte_memcpy(constant) size:16 time:868
rte_memcpy(variable) size:16 time:4436
rte_memcpy(constant) size:32 time:856
rte_memcpy(variable) size:32 time:3264
rte_memcpy(constant) size:48 time:872
rte_memcpy(variable) size:48 time:3972
rte_memcpy(constant) size:64 time:856
rte_memcpy(variable) size:64 time:3644
rte_memcpy(constant) size:128 time:868
rte_memcpy(variable) size:128 time:4720
rte_memcpy(constant) size:256 time:868
rte_memcpy(variable) size:256 time:9624

Here is my test program(Who know how to use a loop to test 'constant memcpy'?):

#include <stdio.h>
#include <rte_memcpy.h>
#include <rte_cycles.h>


int main(int narg, char** args)
{
        int i,t;
        char buf[256];
        int tests[7] = {8,16,32,48,64,128,256};
        char buf8[8],buf16[16],buf32[32],buf48[48],buf64[64],buf128[128],buf256[256];
        uint64_t start, end;
        int times = 9999999;
        uint64_t result_c[7];

        if (narg < 2) {
                printf("usage:./rte_memcpy_test times\n");
                return -1;
        }

        times = atoi(args[1]);

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf8, buf8, sizeof buf8);
        }
        end = rte_rdtsc();
        result_c[0] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf16, buf16, sizeof buf16);
        }
        end = rte_rdtsc();
        result_c[1] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf32, buf32, sizeof buf32);
        }
        end = rte_rdtsc();
        result_c[2] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf48, buf48, sizeof buf48);
        }
        end = rte_rdtsc();
        result_c[3] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf64, buf64, sizeof buf64);
        }
        end = rte_rdtsc();
        result_c[4] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf128, buf128, sizeof buf128);
        }
        end = rte_rdtsc();
        result_c[5] = end - start;

        start = rte_rdtsc();
        for(t = 0; t < times; t++) {
                rte_memcpy(buf256, buf256, sizeof buf256);
        }
        end = rte_rdtsc();
        result_c[6] = end - start;

        for (i = 0; i < (sizeof tests / sizeof tests[0]); i++) {
                start = rte_rdtsc();
                for(t = 0; t < times; t++) {
                        rte_memcpy(buf, buf, tests[i]);
                }
                end = rte_rdtsc();
                printf("rte_memcpy(constant) size:%d time:%llu\n", tests[i], result_c[i]);
                printf("rte_memcpy(variable) size:%d time:%llu\n", tests[i], end - start);
        }

        return 0;
}

-- 
Regards,
Haifeng