From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
"Zhang, Helin" <helin.zhang@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy
Date: Mon, 2 Oct 2017 00:12:25 +0000 [thread overview]
Message-ID: <B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com>
Hi
> That means that each file with '#include <re_memcpy.h> will have its own
> copy
> of that function:
> $ objdump -d x86_64-native-linuxapp-gcc/app/testpmd | grep
> '<rte_memcpy_init>:' | sort -u | wc -l
> 233
> Same story for rte_memcpy_ptr and rte_memcpy_DEFAULT, etc...
> Obviously we need (and want) only one copy of that stuff per binary.
>
> > +#ifdef CC_SUPPORT_AVX2
>
> Why do you assume this macro will be defined?
> By whom?
> There is no such macro with gcc:
> $ gcc -march=native -dM -E - </dev/null 2>&1 | grep AVX2
> #define __AVX2__ 1
> , and you don't define it yourself.
> When building with '-march=native' on BDW only rte_memcpy_DEFAULT get
> compiled.
>
I defined it myself. But when I sort the patch, I forgot to modify the file in this version. Sorry about that.
It should be like this. To check whether the compiler supports AVX2 or AVX512.
diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index a813c91..92399ec 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -141,3 +141,17 @@ space:= $(empty) $(empty)
CPUFLAGSTMP1 := $(addprefix RTE_CPUFLAG_,$(CPUFLAGS))
CPUFLAGSTMP2 := $(subst $(space),$(comma),$(CPUFLAGSTMP1))
CPUFLAGS_LIST := -DRTE_COMPILE_TIME_CPUFLAGS=$(CPUFLAGSTMP2)
+
+# Check if the compiler supports AVX512.
+CC_SUPPORT_AVX512 := $(shell $(CC) -march=skylake-avx512 -dM -E - < /dev/null 2>&1 | grep -q AVX512 && echo 1)
+ifeq ($(CC_SUPPORT_AVX512),1)
+ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
+MACHINE_CFLAGS += -DCC_SUPPORT_AVX512
+endif
+endif
+
+# Check if the compiler supports AVX2.
+CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | grep -q AVX2 && echo 1)
+ifeq ($(CC_SUPPORT_AVX2),1)
+MACHINE_CFLAGS += -DCC_SUPPORT_AVX2
+endif
> To summarize: as I understand the goal of that patch was
> (assuming that our current rte_memcpy() implementation is good in terms of
> both performance and functionality):
> 1. Based on current rte_memcpy() implementation define 3 x86 arch specific
> rte_memcpy flavors:
> a) rte_memcpy_SSE
> b) rte_memcpy_AVX2
> c) rte_memcpy_AVX512
> 2. Select appropriate flavor based on current HW at runtime,
> i.e. both 3 flavors should be present in the binary and selection should be
> made
> at program startup.
>
> As I can see none of the goals was achieved with the current patch,
> instead a lot of redundant code was introduced.
> So I think it is NACK for the current version.
> What I think need to be done instead:
>
> 1. mv lib/librte_eal/common/include/arch/x86/rte_memcpy.h
> lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h
> 2. inside rte_memcpy_internal.h rename rte_memcpy() into
> rte_memcpy_internal().
> 3. create 3 files:
> rte_memcpy_sse.c
> rte_memcpy_avx2.c
> rte_memcpy_avx512.c
>
> Inside each of these files we define corresponding rte_memcpy_xxx()
> function.
> I.E:
> rte_memcpy_avx2.c:
> ....
> #ifndef RTE_MACHINE_CPUFLAG_AVX2
> #error "no avx2 support"
> endif
>
> #include "rte_memcpy_internal.h"
> ...
>
> void *
> rte_memcpy_avx2(void *dst, const void *src, size_t n)
> {
> return rte_memcpy_internal(dst, src, n);
> }
>
> 4. Make changes inside lib/librte_eal/Makefile to ensure that each of
> rte_memcpy_xxx()
> get build with appropriate -march flags (I.E: avx2 with -mavx2, etc.)
> You can use librte_acl/Makefile as a reference.
>
> 5. Create rte_memcpy.c and put rte_memcpy_ptr/rte_memcpy_init()
> definitions in that file.
> 6. Create new rte_memcpy.h and define rte_memcpy() in it:
>
> ...
> #include <rte_memcpy_internal.h>
> ...
>
> +#define RTE_X86_MEMCPY_THRESH 128
> static inline void *
> rte_memcpy(void *dst, const void *src, size_t n)
> {
> if (n <= RTE_X86_MEMCPY_THRESH)
> return rte_memcpy_internal(dst, src, n);
> else
> return (*rte_memcpy_ptr)(dst, src, n);
> }
>
> 7. Test it properly - i.e. build dpdk with default target and make sure each of
> 3 flavors
> could be selected properly at runtime based on underlying arch.
>
> 8. As a possible future improvement - with such changes we don't need a
> generic inline
> implementation. We can think about creating a faster version that need to
> copy
> <= 128B.
>
> Konstantin
>
Will modify it in next version.
Thanks.
next prev parent reply other threads:[~2017-10-02 0:12 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-26 7:41 [dpdk-dev] [PATCH v3 0/3] dynamic linking support Xiaoyun Li
2017-09-26 7:41 ` [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-01 23:41 ` Ananyev, Konstantin
2017-10-02 0:12 ` Li, Xiaoyun [this message]
2017-09-26 7:41 ` [dpdk-dev] [PATCH v3 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-09-26 7:41 ` [dpdk-dev] [PATCH v3 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02 0:08 ` Ananyev, Konstantin
2017-10-02 0:09 ` Li, Xiaoyun
2017-10-02 9:35 ` Ananyev, Konstantin
2017-10-02 16:13 ` [dpdk-dev] [PATCH v4 0/3] run-time Linking support Xiaoyun Li
2017-10-02 16:13 ` [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-02 16:39 ` Ananyev, Konstantin
2017-10-02 23:10 ` Li, Xiaoyun
2017-10-03 11:15 ` Ananyev, Konstantin
2017-10-03 11:39 ` Li, Xiaoyun
2017-10-03 12:12 ` Ananyev, Konstantin
2017-10-03 12:23 ` Li, Xiaoyun
2017-10-02 16:13 ` [dpdk-dev] [PATCH v4 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-02 16:13 ` [dpdk-dev] [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02 16:52 ` Ananyev, Konstantin
2017-10-03 8:15 ` Li, Xiaoyun
2017-10-03 11:23 ` Ananyev, Konstantin
2017-10-03 11:27 ` Li, Xiaoyun
2017-10-03 14:59 ` [dpdk-dev] [PATCH v5 0/3] run-time Linking support Xiaoyun Li
2017-10-03 14:59 ` [dpdk-dev] [PATCH v5 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-03 14:59 ` [dpdk-dev] [PATCH v5 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-03 14:59 ` [dpdk-dev] [PATCH v5 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-04 17:56 ` [dpdk-dev] [PATCH v5 0/3] run-time Linking support Ananyev, Konstantin
2017-10-04 22:33 ` Li, Xiaoyun
2017-10-04 22:58 ` [dpdk-dev] [PATCH v6 " Xiaoyun Li
2017-10-04 22:58 ` [dpdk-dev] [PATCH v6 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-05 9:37 ` Ananyev, Konstantin
2017-10-05 9:38 ` Ananyev, Konstantin
2017-10-05 11:19 ` Li, Xiaoyun
2017-10-05 11:26 ` Richardson, Bruce
2017-10-05 11:26 ` Li, Xiaoyun
2017-10-05 12:12 ` Ananyev, Konstantin
2017-10-04 22:58 ` [dpdk-dev] [PATCH v6 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-04 22:58 ` [dpdk-dev] [PATCH v6 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05 9:40 ` Ananyev, Konstantin
2017-10-05 10:23 ` Li, Xiaoyun
2017-10-05 12:33 ` [dpdk-dev] [PATCH v7 0/3] run-time Linking support Xiaoyun Li
2017-10-05 12:33 ` [dpdk-dev] [PATCH v7 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-09 17:47 ` Thomas Monjalon
2017-10-13 1:06 ` Li, Xiaoyun
2017-10-13 7:21 ` Thomas Monjalon
2017-10-13 7:30 ` Li, Xiaoyun
2017-10-13 7:31 ` Ananyev, Konstantin
2017-10-13 7:36 ` Thomas Monjalon
2017-10-13 7:41 ` Li, Xiaoyun
2017-10-05 12:33 ` [dpdk-dev] [PATCH v7 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-05 12:33 ` [dpdk-dev] [PATCH v7 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05 13:24 ` [dpdk-dev] [PATCH v7 0/3] run-time Linking support Ananyev, Konstantin
2017-10-09 17:40 ` Thomas Monjalon
2017-10-13 0:58 ` Li, Xiaoyun
2017-10-13 9:01 ` [dpdk-dev] [PATCH v8 " Xiaoyun Li
2017-10-13 9:01 ` [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-13 9:28 ` Thomas Monjalon
2017-10-13 10:26 ` Ananyev, Konstantin
2017-10-17 21:24 ` Thomas Monjalon
2017-10-18 2:21 ` Li, Xiaoyun
2017-10-18 6:22 ` Li, Xiaoyun
2017-10-19 2:45 ` Li, Xiaoyun
2017-10-19 6:58 ` Thomas Monjalon
2017-10-19 7:51 ` Li, Xiaoyun
2017-10-19 8:33 ` Thomas Monjalon
2017-10-19 8:50 ` Li, Xiaoyun
2017-10-19 8:59 ` Ananyev, Konstantin
2017-10-19 9:00 ` Thomas Monjalon
2017-10-19 9:29 ` Bruce Richardson
2017-10-20 1:02 ` Li, Xiaoyun
2017-10-25 6:55 ` Li, Xiaoyun
2017-10-25 7:25 ` Thomas Monjalon
2017-10-29 8:49 ` Thomas Monjalon
2017-11-02 10:22 ` Wang, Zhihong
2017-11-02 10:44 ` Thomas Monjalon
2017-11-02 10:58 ` Li, Xiaoyun
2017-11-02 12:15 ` Thomas Monjalon
2017-11-03 7:47 ` Yao, Lei A
2017-10-25 8:50 ` Ananyev, Konstantin
2017-10-25 8:54 ` Li, Xiaoyun
2017-10-25 9:00 ` Thomas Monjalon
2017-10-25 10:32 ` Li, Xiaoyun
2017-10-25 9:14 ` Ananyev, Konstantin
2017-10-13 9:01 ` [dpdk-dev] [PATCH v8 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-13 9:01 ` [dpdk-dev] [PATCH v8 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-13 13:13 ` [dpdk-dev] [PATCH v8 0/3] run-time Linking support Thomas Monjalon
[not found] <506411689-94690-2-git-send-email-xiaoyun.li@intel.com>
2017-10-02 12:31 ` [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Konstantin Ananyev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com \
--to=xiaoyun.li@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=helin.zhang@intel.com \
--cc=konstantin.ananyev@intel.com \
--cc=wenzhuo.lu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).