From: linhaifeng <haifeng.lin@huawei.com>
To: Zhihong Wang <zhihong.wang@intel.com>, <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
Date: Wed, 30 Aug 2017 17:37:42 +0800 [thread overview]
Message-ID: <59A68766.7080307@huawei.com> (raw)
In-Reply-To: <1453086314-30158-1-git-send-email-zhihong.wang@intel.com>
在 2016/1/18 11:05, Zhihong Wang 写道:
> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.
>
> In current DPDK, memcpy holds a large proportion of execution time in
> libs like Vhost, especially for large packets, and this patch can bring
> considerable benefits.
>
> The implementation is based on the current DPDK memcpy framework, some
> background introduction can be found in these threads:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
> http://dpdk.org/ml/archives/dev/2015-January/011800.html
>
> Code changes are:
>
> 1. Read CPUID to check if AVX512 is supported by CPU
>
> 2. Predefine AVX512 macro if AVX512 is enabled by compiler
>
> 3. Implement AVX512 memcpy and choose the right implementation based on
> predefined macros
>
> 4. Decide alignment unit for memcpy perf test based on predefined macros
>
> --------------
> Changes in v2:
>
> 1. Tune performance for prior platforms
>
> Zhihong Wang (5):
> lib/librte_eal: Identify AVX512 CPU flag
> mk: Predefine AVX512 macro for compiler
> lib/librte_eal: Optimize memcpy for AVX512 platforms
> app/test: Adjust alignment unit for memcpy perf test
> lib/librte_eal: Tune memcpy for prior platforms
>
> app/test/test_memcpy_perf.c | 6 +
> .../common/include/arch/x86/rte_cpuflags.h | 2 +
> .../common/include/arch/x86/rte_memcpy.h | 269 ++++++++++++++++++++-
> mk/rte.cpuflags.mk | 4 +
> 4 files changed, 268 insertions(+), 13 deletions(-)
>
Hi Zhihong Wang
I test avx512 rte_memcpy found the performanc for ovs dpdk is lower than avx2 rte_memcpy.
The vm loop test for ovs dpdk results:
avx512 is *15*Gbps
perf data:
0.52 │ vmovdq (%r8,%r10,1),%zmm0
95.33 │ sub $0x40,%r9
0.45 │ add $0x40,%r8
0.60 │ vmovdq %zmm0,-0x40(%r8)
1.84 │ cmp $0x3f,%r9
│ ↓ ja f20
│ lea -0x40(%rsi),%r8
0.15 │ or $0xffffffffffffffc0,%rsi
0.21 │ and $0xffffffffffffffc0,%r8
0.00 │ lea 0x40(%rsi,%r8,1),%rsi
0.00 │ vmovdq (%rcx,%rsi,1),%zmm0
0.22 │ vmovdq %zmm0,(%rdx,%rsi,1)
0.67 │ ↓ jmpq c78
│ mov -0x128(%rbp),%rdi
│ rex.R
│ .byte 0x89
│ popfq
avx2 is *18.8*Gbps
perf data:
0.96 │ add %r9,%r13
66.04 │ vmovdq (%rdx),%ymm0
1.20 │ sub $0x40,%rdi
1.53 │ add $0x40,%rdx
10.83 │ vmovdq %ymm0,-0x40(%rdx,%r15,1)
8.64 │ vmovdq -0x20(%rdx),%ymm0
7.58 │ vmovdq %ymm0,-0x40(%rdx,%r13,1)
dpdk version: v17.05
ovs version: 2.8.90
qemu version: QEMU emulator version 2.9.94 (v2.10.0-rc4-dirty)
gcc version: gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
kernal version: 3.10.0
compile dpdk:
CONFIG_RTE_ENABLE_AVX512=y
export DPDK_DIR=$PWD
export DPDK_TARGET=x86_64-native-linuxapp-gcc
export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
make install T=$DPDK_TARGET DESTDIR=install
compile ovs:
sh boot.sh
./configure CFLAGS="-g -O2" --with-dpdk=$DPDK_BUILD --prefix=/usr --localstatedir=/var --sysconfdir=/etc
make -j
make install
The test for dpdk test_memcpy_perf:
avx2:
** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
======= ============== ============== ============== ==============
Size Cache to cache Cache to mem Mem to cache Mem to mem
(bytes) (ticks) (ticks) (ticks) (ticks)
------- -------------- -------------- -------------- --------------
========================== 32B aligned ============================
64 6 - 10 27 - 52 30 - 39 56 - 97
512 24 - 44 251 - 271 145 - 217 396 - 447
1024 35 - 78 394 - 433 252 - 319 609 - 670
------- -------------- -------------- -------------- --------------
C 64 3 - 9 28 - 31 29 - 40 55 - 66
C 512 25 - 55 253 - 268 139 - 268 397 - 410
C 1024 32 - 83 394 - 416 250 - 396 612 - 687
=========================== Unaligned =============================
64 8 - 9 85 - 71 45 - 45 125 - 121
512 33 - 49 282 - 305 153 - 252 420 - 478
1024 42 - 83 409 - 491 259 - 389 640 - 748
------- -------------- -------------- -------------- --------------
C 64 4 - 9 42 - 46 39 - 46 76 - 90
C 512 33 - 55 280 - 272 153 - 281 421 - 415
C 1024 41 - 83 407 - 427 258 - 405 578 - 701
======= ============== ============== ============== ==============
avx512:
** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
======= ============== ============== ============== ==============
Size Cache to cache Cache to mem Mem to cache Mem to mem
(bytes) (ticks) (ticks) (ticks) (ticks)
------- -------------- -------------- -------------- --------------
========================== 64B aligned ============================
64 6 - 9 18 - 33 24 - 38 40 - 65
512 18 - 44 178 - 262 138 - 218 309 - 429
1024 27 - 79 338 - 430 250 - 322 560 - 674
------- -------------- -------------- -------------- --------------
C 64 3 - 9 18 - 20 23 - 41 39 - 50
C 512 15 - 54 205 - 270 134 - 268 304 - 409
C 1024 24 - 83 371 - 414 242 - 400 550 - 692
=========================== Unaligned =============================
64 8 - 9 87 - 74 45 - 48 125 - 118
512 23 - 49 298 - 311 150 - 250 437 - 482
1024 36 - 83 427 - 505 259 - 406 633 - 754
------- -------------- -------------- -------------- --------------
C 64 4 - 9 42 - 46 39 - 46 76 - 94
C 512 23 - 55 246 - 277 152 - 290 349 - 426
C 1024 38 - 83 398 - 431 258 - 416 634 - 725
======= ============== ============== ============== ==============
next prev parent reply other threads:[~2017-08-30 9:44 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-14 6:13 [dpdk-dev] [PATCH 0/4] " Zhihong Wang
2016-01-14 6:13 ` [dpdk-dev] [PATCH 1/4] lib/librte_eal: Identify AVX512 CPU flag Zhihong Wang
2016-01-14 6:13 ` [dpdk-dev] [PATCH 2/4] mk: Predefine AVX512 macro for compiler Zhihong Wang
2016-01-14 6:13 ` [dpdk-dev] [PATCH 3/4] lib/librte_eal: Optimize memcpy for AVX512 platforms Zhihong Wang
2016-01-14 6:13 ` [dpdk-dev] [PATCH 4/4] app/test: Adjust alignment unit for memcpy perf test Zhihong Wang
2016-01-14 16:48 ` [dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms Stephen Hemminger
2016-01-15 6:39 ` Wang, Zhihong
2016-01-15 22:03 ` Vincent JARDIN
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 0/5] " Zhihong Wang
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 1/5] lib/librte_eal: Identify AVX512 CPU flag Zhihong Wang
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 2/5] mk: Predefine AVX512 macro for compiler Zhihong Wang
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 3/5] lib/librte_eal: Optimize memcpy for AVX512 platforms Zhihong Wang
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 4/5] app/test: Adjust alignment unit for memcpy perf test Zhihong Wang
2016-01-18 3:05 ` [dpdk-dev] [PATCH v2 5/5] lib/librte_eal: Tune memcpy for prior platforms Zhihong Wang
2016-01-18 20:06 ` [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms Stephen Hemminger
2016-01-19 2:37 ` Wang, Zhihong
2016-01-27 15:23 ` Thomas Monjalon
2016-01-28 6:09 ` Wang, Zhihong
2016-01-27 15:30 ` Thomas Monjalon
2016-01-27 18:48 ` Ananyev, Konstantin
2016-01-27 20:18 ` Thomas Monjalon
2017-08-30 9:37 ` linhaifeng [this message]
2017-09-18 5:10 ` Wang, Zhihong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59A68766.7080307@huawei.com \
--to=haifeng.lin@huawei.com \
--cc=dev@dpdk.org \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).