DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy
Date: Mon, 2 Oct 2017 00:12:25 +0000	[thread overview]
Message-ID: <B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAA2BD2@IRSMSX103.ger.corp.intel.com>

Hi

> That means that each file with '#include <re_memcpy.h> will have its own
> copy
> of that function:
> $ objdump -d  x86_64-native-linuxapp-gcc/app/testpmd  | grep
> '<rte_memcpy_init>:' | sort -u | wc -l
> 233
> Same story for rte_memcpy_ptr and rte_memcpy_DEFAULT, etc...
> Obviously we need (and want) only one copy of that stuff per binary.
> 
> > +#ifdef CC_SUPPORT_AVX2
> 
> Why do you assume this macro will be defined?
> By whom?
> There is no such macro with gcc:
> $ gcc -march=native -dM -E - </dev/null 2>&1 | grep AVX2
> #define __AVX2__ 1
> , and you don't define it yourself.
> When building with '-march=native' on BDW only rte_memcpy_DEFAULT get
> compiled.
> 

I defined it myself. But when I sort the patch, I forgot to modify the file in this version. Sorry about that.
It should be like this. To check whether the compiler supports AVX2 or AVX512.

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index a813c91..92399ec 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -141,3 +141,17 @@ space:= $(empty) $(empty)
 CPUFLAGSTMP1 := $(addprefix RTE_CPUFLAG_,$(CPUFLAGS))
 CPUFLAGSTMP2 := $(subst $(space),$(comma),$(CPUFLAGSTMP1))
 CPUFLAGS_LIST := -DRTE_COMPILE_TIME_CPUFLAGS=$(CPUFLAGSTMP2)
+
+# Check if the compiler supports AVX512.
+CC_SUPPORT_AVX512 := $(shell $(CC) -march=skylake-avx512 -dM -E - < /dev/null 2>&1 | grep -q AVX512 && echo 1)
+ifeq ($(CC_SUPPORT_AVX512),1)
+ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
+MACHINE_CFLAGS += -DCC_SUPPORT_AVX512
+endif
+endif
+
+# Check if the compiler supports AVX2.
+CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | grep -q AVX2 && echo 1)
+ifeq ($(CC_SUPPORT_AVX2),1)
+MACHINE_CFLAGS += -DCC_SUPPORT_AVX2
+endif


> To summarize: as I understand the goal of that patch was
> (assuming that our current rte_memcpy() implementation is good in terms of
> both performance and functionality):
> 1. Based on current rte_memcpy() implementation define 3 x86 arch specific
> rte_memcpy flavors:
>   a) rte_memcpy_SSE
>   b) rte_memcpy_AVX2
>   c) rte_memcpy_AVX512
> 2. Select appropriate flavor based on current HW at runtime,
>     i.e. both 3 flavors should be present in the binary and selection should be
> made
>     at program startup.
> 
> As I can see none of the goals was achieved with the current patch,
> instead a lot of redundant code was introduced.
> So I think it is NACK for the current version.
> What I think need to be done instead:
> 
> 1.  mv lib/librte_eal/common/include/arch/x86/rte_memcpy.h
> lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h
> 2. inside rte_memcpy_internal.h rename rte_memcpy() into
> rte_memcpy_internal().
> 3. create 3 files:
>     rte_memcpy_sse.c
>     rte_memcpy_avx2.c
>     rte_memcpy_avx512.c
> 
> Inside each of these files we define corresponding rte_memcpy_xxx()
> function.
> I.E:
> rte_memcpy_avx2.c:
> ....
> #ifndef RTE_MACHINE_CPUFLAG_AVX2
> #error "no avx2 support"
> endif
> 
> #include "rte_memcpy_internal.h"
> ...
> 
> void *
> rte_memcpy_avx2(void *dst, const void *src, size_t n)
> {
>    return rte_memcpy_internal(dst, src, n);
> }
> 
> 4. Make changes inside lib/librte_eal/Makefile to ensure that each of
> rte_memcpy_xxx()
> get build with appropriate -march flags (I.E: avx2 with -mavx2, etc.)
> You can use librte_acl/Makefile as a reference.
> 
> 5. Create rte_memcpy.c and put rte_memcpy_ptr/rte_memcpy_init()
> definitions in that file.
> 6. Create new rte_memcpy.h  and define rte_memcpy() in it:
> 
> ...
> #include <rte_memcpy_internal.h>
> ...
> 
> +#define RTE_X86_MEMCPY_THRESH 128
> static inline void *
> rte_memcpy(void *dst, const void *src, size_t n)
> {
> 	if (n <= RTE_X86_MEMCPY_THRESH)
>                   return rte_memcpy_internal(dst, src, n);
>                else
> 	   return (*rte_memcpy_ptr)(dst, src, n);
> }
> 
> 7. Test it properly - i.e. build dpdk with default target and make sure each of
> 3 flavors
> could be selected properly at runtime based on underlying arch.
> 
> 8. As a possible future improvement - with such changes we don't need a
> generic inline
> implementation. We can think about creating a faster version that need to
> copy
> <= 128B.
> 
> Konstantin
> 

Will modify it in next version.
Thanks.

  reply	other threads:[~2017-10-02  0:12 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-26  7:41 [dpdk-dev] [PATCH v3 0/3] dynamic linking support Xiaoyun Li
2017-09-26  7:41 ` [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-01 23:41   ` Ananyev, Konstantin
2017-10-02  0:12     ` Li, Xiaoyun [this message]
2017-09-26  7:41 ` [dpdk-dev] [PATCH v3 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-09-26  7:41 ` [dpdk-dev] [PATCH v3 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02  0:08   ` Ananyev, Konstantin
2017-10-02  0:09     ` Li, Xiaoyun
2017-10-02  9:35     ` Ananyev, Konstantin
2017-10-02 16:13 ` [dpdk-dev] [PATCH v4 0/3] run-time Linking support Xiaoyun Li
2017-10-02 16:13   ` [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-02 16:39     ` Ananyev, Konstantin
2017-10-02 23:10       ` Li, Xiaoyun
2017-10-03 11:15         ` Ananyev, Konstantin
2017-10-03 11:39           ` Li, Xiaoyun
2017-10-03 12:12             ` Ananyev, Konstantin
2017-10-03 12:23               ` Li, Xiaoyun
2017-10-02 16:13   ` [dpdk-dev] [PATCH v4 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-02 16:13   ` [dpdk-dev] [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-02 16:52     ` Ananyev, Konstantin
2017-10-03  8:15       ` Li, Xiaoyun
2017-10-03 11:23         ` Ananyev, Konstantin
2017-10-03 11:27           ` Li, Xiaoyun
2017-10-03 14:59   ` [dpdk-dev] [PATCH v5 0/3] run-time Linking support Xiaoyun Li
2017-10-03 14:59     ` [dpdk-dev] [PATCH v5 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-03 14:59     ` [dpdk-dev] [PATCH v5 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-03 14:59     ` [dpdk-dev] [PATCH v5 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-04 17:56     ` [dpdk-dev] [PATCH v5 0/3] run-time Linking support Ananyev, Konstantin
2017-10-04 22:33       ` Li, Xiaoyun
2017-10-04 22:58     ` [dpdk-dev] [PATCH v6 " Xiaoyun Li
2017-10-04 22:58       ` [dpdk-dev] [PATCH v6 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-05  9:37         ` Ananyev, Konstantin
2017-10-05  9:38           ` Ananyev, Konstantin
2017-10-05 11:19           ` Li, Xiaoyun
2017-10-05 11:26             ` Richardson, Bruce
2017-10-05 11:26             ` Li, Xiaoyun
2017-10-05 12:12               ` Ananyev, Konstantin
2017-10-04 22:58       ` [dpdk-dev] [PATCH v6 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-04 22:58       ` [dpdk-dev] [PATCH v6 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05  9:40         ` Ananyev, Konstantin
2017-10-05 10:23           ` Li, Xiaoyun
2017-10-05 12:33       ` [dpdk-dev] [PATCH v7 0/3] run-time Linking support Xiaoyun Li
2017-10-05 12:33         ` [dpdk-dev] [PATCH v7 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-09 17:47           ` Thomas Monjalon
2017-10-13  1:06             ` Li, Xiaoyun
2017-10-13  7:21               ` Thomas Monjalon
2017-10-13  7:30                 ` Li, Xiaoyun
2017-10-13  7:31                 ` Ananyev, Konstantin
2017-10-13  7:36                   ` Thomas Monjalon
2017-10-13  7:41                     ` Li, Xiaoyun
2017-10-05 12:33         ` [dpdk-dev] [PATCH v7 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-05 12:33         ` [dpdk-dev] [PATCH v7 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-05 13:24         ` [dpdk-dev] [PATCH v7 0/3] run-time Linking support Ananyev, Konstantin
2017-10-09 17:40         ` Thomas Monjalon
2017-10-13  0:58           ` Li, Xiaoyun
2017-10-13  9:01         ` [dpdk-dev] [PATCH v8 " Xiaoyun Li
2017-10-13  9:01           ` [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy Xiaoyun Li
2017-10-13  9:28             ` Thomas Monjalon
2017-10-13 10:26               ` Ananyev, Konstantin
2017-10-17 21:24             ` Thomas Monjalon
2017-10-18  2:21               ` Li, Xiaoyun
2017-10-18  6:22                 ` Li, Xiaoyun
2017-10-19  2:45                   ` Li, Xiaoyun
2017-10-19  6:58                     ` Thomas Monjalon
2017-10-19  7:51                       ` Li, Xiaoyun
2017-10-19  8:33                         ` Thomas Monjalon
2017-10-19  8:50                           ` Li, Xiaoyun
2017-10-19  8:59                             ` Ananyev, Konstantin
2017-10-19  9:00                             ` Thomas Monjalon
2017-10-19  9:29                               ` Bruce Richardson
2017-10-20  1:02                                 ` Li, Xiaoyun
2017-10-25  6:55                                   ` Li, Xiaoyun
2017-10-25  7:25                                     ` Thomas Monjalon
2017-10-29  8:49                                       ` Thomas Monjalon
2017-11-02 10:22                                         ` Wang, Zhihong
2017-11-02 10:44                                           ` Thomas Monjalon
2017-11-02 10:58                                             ` Li, Xiaoyun
2017-11-02 12:15                                               ` Thomas Monjalon
2017-11-03  7:47                                             ` Yao, Lei A
2017-10-25  8:50                                     ` Ananyev, Konstantin
2017-10-25  8:54                                       ` Li, Xiaoyun
2017-10-25  9:00                                         ` Thomas Monjalon
2017-10-25 10:32                                           ` Li, Xiaoyun
2017-10-25  9:14                                         ` Ananyev, Konstantin
2017-10-13  9:01           ` [dpdk-dev] [PATCH v8 2/3] app/test: run-time dispatch over memcpy perf test Xiaoyun Li
2017-10-13  9:01           ` [dpdk-dev] [PATCH v8 3/3] efd: run-time dispatch over x86 EFD functions Xiaoyun Li
2017-10-13 13:13           ` [dpdk-dev] [PATCH v8 0/3] run-time Linking support Thomas Monjalon
     [not found] <506411689-94690-2-git-send-email-xiaoyun.li@intel.com>
2017-10-02 12:31 ` [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B9E724F4CB7543449049E7AE7669D82F463884@SHSMSX101.ccr.corp.intel.com \
    --to=xiaoyun.li@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=wenzhuo.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).