From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com [209.85.214.177]) by dpdk.org (Postfix) with ESMTP id AD14B5694 for ; Wed, 13 May 2015 03:18:30 +0200 (CEST) Received: by obcus9 with SMTP id us9so18854407obc.2 for ; Tue, 12 May 2015 18:18:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=39RYLYrPYJ6smOudgDAU7vkhCj/IFLUzYzpI0RL3ZOE=; b=FXuJlG8QES0jjKncjw2BzkFBjRpg//fyhrkkIHaiNRiVUuaagWWJjNOxJQRQ/by9Hv l6Q95Wvptesh2aAgcrXUw1/icJRd/tbdYG/+4TNKfwFaQWfKUhbHME4GcbqgbKyW0zO4 FjYZyhyqWxiyOb5IX4Z5EjeeFSCNZWh2GDbnk9Kh8Hygln9T1zAE5RZdlcRtI0sRMOK7 gFRrAIRnhaT6/+YBfqd/e2ciHVOpCMkWx+DD/cZKGwuw3z7BgJ9TWfsl6DKJKfNVLwI9 yxPTNbmd3BUm/FJx1o8++ItRCZJ0igTs3FBQzAGqsSMUT2LL4n89Kie7wQUlG6mXtnO+ RA8g== MIME-Version: 1.0 X-Received: by 10.182.199.34 with SMTP id jh2mr14305397obc.48.1431479910184; Tue, 12 May 2015 18:18:30 -0700 (PDT) Received: by 10.202.179.195 with HTTP; Tue, 12 May 2015 18:18:30 -0700 (PDT) In-Reply-To: <5551B615.5060405@huawei.com> References: <1431119946-32078-1-git-send-email-rkerur@gmail.com> <1431119989-32124-1-git-send-email-rkerur@gmail.com> <5551B615.5060405@huawei.com> Date: Tue, 12 May 2015 18:18:30 -0700 Message-ID: From: Ravi Kerur To: Linhaifeng Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2015 01:18:31 -0000 Hi Linhaifeng, On Tue, May 12, 2015 at 1:13 AM, Linhaifeng wrote: > Hi, Ravi Kerur > > On 2015/5/9 5:19, Ravi Kerur wrote: > > Preliminary results on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu > > 14.04 x86_64 shows comparisons using AVX/SSE instructions taking 1/3rd > > CPU ticks for 16, 32, 48 and 64 bytes comparison. In addition, > > I had write a program to test rte_memcmp and I have a question about the > result. > Why cost same CPU ticks for 128 256 512 1024 1500 bytes? Is there any > problem in > my test? > > If you can wait until Thursday I will probably send v3 patch which will have full memcmp support. In your program try with volatile pointer and see if it helps. > > [root@localhost test]# gcc avx_test.c -O3 -I > /data/linhf/v2r2c00/open-source/dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/include/ > -mavx2 -DRTE_MACHINE_CPUFLAG_AVX2 > [root@localhost test]# ./a.out 0 > each test run 100000000 times > copy 16 bytes costs average 7(rte_memcmp) 10(memcmp) ticks > copy 32 bytes costs average 9(rte_memcmp) 11(memcmp) ticks > copy 64 bytes costs average 6(rte_memcmp) 13(memcmp) ticks > copy 128 bytes costs average 11(rte_memcmp) 14(memcmp) ticks > copy 256 bytes costs average 9(rte_memcmp) 14(memcmp) ticks > copy 512 bytes costs average 9(rte_memcmp) 14(memcmp) ticks > copy 1024 bytes costs average 9(rte_memcmp) 14(memcmp) ticks > copy 1500 bytes costs average 11(rte_memcmp) 14(memcmp) ticks > [root@localhost test]# ./a.out 1 > each test run 100000000 times > copy 16 bytes costs average 2(rte_memcpy) 10(memcpy) ticks > copy 32 bytes costs average 2(rte_memcpy) 10(memcpy) ticks > copy 64 bytes costs average 3(rte_memcpy) 10(memcpy) ticks > copy 128 bytes costs average 7(rte_memcpy) 12(memcpy) ticks > copy 256 bytes costs average 9(rte_memcpy) 23(memcpy) ticks > copy 512 bytes costs average 14(rte_memcpy) 34(memcpy) ticks > copy 1024 bytes costs average 37(rte_memcpy) 61(memcpy) ticks > copy 1500 bytes costs average 62(rte_memcpy) 87(memcpy) ticks > > > Here is my program: > > #include > #include > #include > #include > #include > > #define TIMES 100000000L > > void test_memcpy(size_t n) > { > uint64_t start, end, i, start2, end2; > uint8_t *src, *dst; > > src = (uint8_t*)malloc(n * sizeof(uint8_t)); > dst = (uint8_t*)malloc(n * sizeof(uint8_t)); > > start = rte_rdtsc(); > for (i = 0; i < TIMES; i++) { > rte_memcpy(dst, src, n); > } > end = rte_rdtsc(); > > start2 = rte_rdtsc(); > for (i = 0; i < TIMES; i++) { > memcpy(dst, src, n); > } > end2 = rte_rdtsc(); > > > free(src); > free(dst); > > printf("copy %u bytes costs average %llu(rte_memcpy) %llu(memcpy) > ticks\n", n, (end - start)/TIMES, (end2 - start2)/TIMES); > } > > int test_memcmp(size_t n) > { > uint64_t start, end, i, start2, end2, j; > uint8_t *src, *dst; > int *ret; > > src = (uint8_t*)malloc(n * sizeof(uint8_t)); > dst = (uint8_t*)malloc(n * sizeof(uint8_t)); > ret = (int*)malloc(TIMES * sizeof(int)); > > start = rte_rdtsc(); > for (i = 0; i < TIMES; i++) { > ret[i] = rte_memcmp(dst, src, n); > } > end = rte_rdtsc(); > > start2 = rte_rdtsc(); > for (i = 0; i < TIMES; i++) { > ret[i] = memcmp(dst, src, n); > } > end2 = rte_rdtsc(); > > // avoid gcc to optimize memcmp > for (i = 0; i < TIMES; i++) { > t += ret[i]; > } > > free(src); > free(dst); > > printf("copy %u bytes costs average %llu(rte_memcmp) %llu(memcmp) > ticks\n", n, (end - start)/TIMES, (end2 - start2)/TIMES); > return t; > } > > > > > int main(int narg, char** args) > { > printf("each test run %llu times\n", TIMES); > > if (narg < 2) { > printf("usage:./avx_test 0/1 1:test memcpy 0:test > memcmp\n"); > return -1; > } > > if (atoi(args[1])) { > test_memcpy(16); > test_memcpy(32); > test_memcpy(64); > test_memcpy(128); > test_memcpy(256); > test_memcpy(512); > test_memcpy(1024); > test_memcpy(1500); > } else { > test_memcmp(16); > test_memcmp(32); > test_memcmp(64); > test_memcmp(128); > test_memcmp(256); > test_memcmp(512); > test_memcmp(1024); > test_memcmp(1500); > } > } > > > > > > >