DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Wang, Zhihong" <zhihong.wang@intel.com>
To: linhaifeng <haifeng.lin@huawei.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
Date: Mon, 18 Sep 2017 05:10:40 +0000	[thread overview]
Message-ID: <8F6C2BD409508844A0EFC19955BE094151335900@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <59A68766.7080307@huawei.com>

> Hi Zhihong Wang
> 
> I test avx512 rte_memcpy found the performanc for ovs dpdk is lower than
> avx2 rte_memcpy.

Hi Haifeng,

AVX512 memcpy is marked as experimental and disabled by default, its
benefit varies from case to case. So enable it only when the case
(SW + HW setup with expected data pattern) is verified.

BTW, it's not recommended to use micro benchmarks like test_memcpy_perf
for memcpy performance report as they aren't likely able to reflect
performance of real world applications, please find more details at
https://software.intel.com/en-us/articles/performance-optimization-of-memcpy-in-dpdk


Thanks
Zhihong

> 
> The vm loop test for ovs dpdk results:
> avx512 is *15*Gbps
> perf data:
>   0.52 │      vmovdq (%r8,%r10,1),%zmm0
>  95.33 │      sub    $0x40,%r9
>   0.45 │      add    $0x40,%r8
>   0.60 │      vmovdq %zmm0,-0x40(%r8)
>   1.84 │      cmp    $0x3f,%r9
>        │    ↓ ja     f20
>        │      lea    -0x40(%rsi),%r8
>   0.15 │      or     $0xffffffffffffffc0,%rsi
>   0.21 │      and    $0xffffffffffffffc0,%r8
>   0.00 │      lea    0x40(%rsi,%r8,1),%rsi
>   0.00 │      vmovdq (%rcx,%rsi,1),%zmm0
>   0.22 │      vmovdq %zmm0,(%rdx,%rsi,1)
>   0.67 │    ↓ jmpq   c78
>        │      mov    -0x128(%rbp),%rdi
>        │      rex.R
>        │      .byte  0x89
>        │      popfq
> 
> avx2 is *18.8*Gbps
> perf data:
>   0.96 │      add    %r9,%r13
>  66.04 │      vmovdq (%rdx),%ymm0
>   1.20 │      sub    $0x40,%rdi
>   1.53 │      add    $0x40,%rdx
>  10.83 │      vmovdq %ymm0,-0x40(%rdx,%r15,1)
>   8.64 │      vmovdq -0x20(%rdx),%ymm0
>   7.58 │      vmovdq %ymm0,-0x40(%rdx,%r13,1)
> 
> 
> dpdk version: v17.05
> ovs version: 2.8.90
> qemu version: QEMU emulator version 2.9.94 (v2.10.0-rc4-dirty)
> 
> gcc version: gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
> kernal version: 3.10.0
> 
> 
> compile dpdk:
> CONFIG_RTE_ENABLE_AVX512=y
> export DPDK_DIR=$PWD
> export DPDK_TARGET=x86_64-native-linuxapp-gcc
> export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
> make install T=$DPDK_TARGET DESTDIR=install
> 
> compile ovs:
> sh boot.sh
> ./configure  CFLAGS="-g -O2" --with-dpdk=$DPDK_BUILD --prefix=/usr --
> localstatedir=/var --sysconfdir=/etc
> make -j
> make install
> 
> The test for dpdk test_memcpy_perf:
> avx2:
> ** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
> ======= ============== ============== ==============
> ==============
>    Size Cache to cache   Cache to mem   Mem to cache     Mem to mem
> (bytes)        (ticks)        (ticks)        (ticks)        (ticks)
> ------- -------------- -------------- -------------- --------------
> ========================== 32B aligned
> ============================
>      64       6 -   10      27 -   52      30 -   39      56 -   97
>     512      24 -   44     251 -  271     145 -  217     396 -  447
>    1024      35 -   78     394 -  433     252 -  319     609 -  670
> ------- -------------- -------------- -------------- --------------
> C    64       3 -    9      28 -   31      29 -   40      55 -   66
> C   512      25 -   55     253 -  268     139 -  268     397 -  410
> C  1024      32 -   83     394 -  416     250 -  396     612 -  687
> =========================== Unaligned
> =============================
>      64       8 -    9      85 -   71      45 -   45     125 -  121
>     512      33 -   49     282 -  305     153 -  252     420 -  478
>    1024      42 -   83     409 -  491     259 -  389     640 -  748
> ------- -------------- -------------- -------------- --------------
> C    64       4 -    9      42 -   46      39 -   46      76 -   90
> C   512      33 -   55     280 -  272     153 -  281     421 -  415
> C  1024      41 -   83     407 -  427     258 -  405     578 -  701
> ======= ============== ============== ==============
> ==============
> 
> avx512:
> ** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
> ======= ============== ============== ==============
> ==============
>    Size Cache to cache   Cache to mem   Mem to cache     Mem to mem
> (bytes)        (ticks)        (ticks)        (ticks)        (ticks)
> ------- -------------- -------------- -------------- --------------
> ========================== 64B aligned
> ============================
>      64       6 -    9      18 -   33      24 -   38      40 -   65
>     512      18 -   44     178 -  262     138 -  218     309 -  429
>    1024      27 -   79     338 -  430     250 -  322     560 -  674
> ------- -------------- -------------- -------------- --------------
> C    64       3 -    9      18 -   20      23 -   41      39 -   50
> C   512      15 -   54     205 -  270     134 -  268     304 -  409
> C  1024      24 -   83     371 -  414     242 -  400     550 -  692
> =========================== Unaligned
> =============================
>      64       8 -    9      87 -   74      45 -   48     125 -  118
>     512      23 -   49     298 -  311     150 -  250     437 -  482
>    1024      36 -   83     427 -  505     259 -  406     633 -  754
> ------- -------------- -------------- -------------- --------------
> C    64       4 -    9      42 -   46      39 -   46      76 -   94
> C   512      23 -   55     246 -  277     152 -  290     349 -  426
> C  1024      38 -   83     398 -  431     258 -  416     634 -  725
> ======= ============== ============== ==============
> ==============
> 
> 
> 
> 
> 
> 
> 


      reply	other threads:[~2017-09-18  5:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14  6:13 [dpdk-dev] [PATCH 0/4] " Zhihong Wang
2016-01-14  6:13 ` [dpdk-dev] [PATCH 1/4] lib/librte_eal: Identify AVX512 CPU flag Zhihong Wang
2016-01-14  6:13 ` [dpdk-dev] [PATCH 2/4] mk: Predefine AVX512 macro for compiler Zhihong Wang
2016-01-14  6:13 ` [dpdk-dev] [PATCH 3/4] lib/librte_eal: Optimize memcpy for AVX512 platforms Zhihong Wang
2016-01-14  6:13 ` [dpdk-dev] [PATCH 4/4] app/test: Adjust alignment unit for memcpy perf test Zhihong Wang
2016-01-14 16:48 ` [dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms Stephen Hemminger
2016-01-15  6:39   ` Wang, Zhihong
2016-01-15 22:03     ` Vincent JARDIN
2016-01-18  3:05 ` [dpdk-dev] [PATCH v2 0/5] " Zhihong Wang
2016-01-18  3:05   ` [dpdk-dev] [PATCH v2 1/5] lib/librte_eal: Identify AVX512 CPU flag Zhihong Wang
2016-01-18  3:05   ` [dpdk-dev] [PATCH v2 2/5] mk: Predefine AVX512 macro for compiler Zhihong Wang
2016-01-18  3:05   ` [dpdk-dev] [PATCH v2 3/5] lib/librte_eal: Optimize memcpy for AVX512 platforms Zhihong Wang
2016-01-18  3:05   ` [dpdk-dev] [PATCH v2 4/5] app/test: Adjust alignment unit for memcpy perf test Zhihong Wang
2016-01-18  3:05   ` [dpdk-dev] [PATCH v2 5/5] lib/librte_eal: Tune memcpy for prior platforms Zhihong Wang
2016-01-18 20:06   ` [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms Stephen Hemminger
2016-01-19  2:37     ` Wang, Zhihong
2016-01-27 15:23   ` Thomas Monjalon
2016-01-28  6:09     ` Wang, Zhihong
2016-01-27 15:30   ` Thomas Monjalon
2016-01-27 18:48     ` Ananyev, Konstantin
2016-01-27 20:18       ` Thomas Monjalon
2017-08-30  9:37   ` linhaifeng
2017-09-18  5:10     ` Wang, Zhihong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8F6C2BD409508844A0EFC19955BE094151335900@SHSMSX103.ccr.corp.intel.com \
    --to=zhihong.wang@intel.com \
    --cc=dev@dpdk.org \
    --cc=haifeng.lin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).