DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Yang, Zhiyong" <zhiyong.yang@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"yuanhan.liu@linux.intel.com" <yuanhan.liu@linux.intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com>
Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform
Date: Fri, 16 Dec 2016 02:15:39 +0000	[thread overview]
Message-ID: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB59CAF@BGSMSX101.gar.corp.intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772583F0EFF66@irsmsx105.ger.corp.intel.com>

Hi,Konstantin:

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, December 15, 2016 6:54 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> IA platform
> 
> Hi Zhiyong,
> 
> > -----Original Message-----
> > From: Yang, Zhiyong
> > Sent: Thursday, December 15, 2016 6:51 AM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Thomas Monjalon
> > <thomas.monjalon@6wind.com>
> > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > on IA platform
> >
> > Hi, Thomas, Konstantin:
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> > > Sent: Sunday, December 11, 2016 8:33 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > Monjalon <thomas.monjalon@6wind.com>
> > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> rte_memset
> > > on IA platform
> > >
> > > Hi, Konstantin, Bruce:
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Thursday, December 8, 2016 6:31 PM
> > > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon
> > > > <thomas.monjalon@6wind.com>
> > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> > > > rte_memset on IA platform
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yang, Zhiyong
> > > > > Sent: Thursday, December 8, 2016 9:53 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > > Monjalon <thomas.monjalon@6wind.com>
> > > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch@intel.com>
> > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> > > > > rte_memset on IA platform
> > > > >
> > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> > > >
> > > > static inline void*
> > > > rte_memset_huge(void *s, int c, size_t n) {
> > > >    return __rte_memset_vector(s, c, n); }
> > > >
> > > > static inline void *
> > > > rte_memset(void *s, int c, size_t n) {
> > > > 	If (n < XXX)
> > > > 		return rte_memset_scalar(s, c, n);
> > > > 	else
> > > > 		return rte_memset_huge(s, c, n); }
> > > >
> > > > XXX could be either a define, or could also be a variable, so it
> > > > can be setuped at startup, depending on the architecture.
> > > >
> > > > Would that work?
> > > > Konstantin
> > > >
> > I have implemented the code for  choosing the functions at run time.
> > rte_memcpy is used more frequently, So I test it at run time.
> >
> > typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src,
> > size_t n); extern rte_memcpy_vector_t rte_memcpy_vector; static inline
> > void * rte_memcpy(void *dst, const void *src, size_t n) {
> >         return rte_memcpy_vector(dst, src, n); } In order to reduce
> > the overhead at run time, I assign the function address to var
> > rte_memcpy_vector before main() starts to init the var.
> >
> > static void __attribute__((constructor))
> > rte_memcpy_init(void)
> > {
> > 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> > 	{
> > 		rte_memcpy_vector = rte_memcpy_avx2;
> > 	}
> > 	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> > 	{
> > 		rte_memcpy_vector = rte_memcpy_sse;
> > 	}
> > 	else
> > 	{
> > 		rte_memcpy_vector = memcpy;
> > 	}
> >
> > }
> 
> I thought we discussed a bit different approach.
> In which rte_memcpy_vector() (rte_memeset_vector) would be called  only
> after some cutoff point, i.e:
> 
> void
> rte_memcpy(void *dst, const void *src, size_t len) {
> 	if (len < N) memcpy(dst, src, len);
> 	else rte_memcpy_vector(dst, src, len);
> }
> 
> If you just always call rte_memcpy_vector() for every len, then it means that
> compiler most likely has always to generate a proper call (not inlining
> happening).

> For small length(s) price of extra function would probably overweight any
> potential gain with SSE/AVX2 implementation.
> 
> Konstantin

Yes, in fact,  from my tests, For small length(s)  rte_memset is far better than glibc memset, 
For large lengths, rte_memset is only a bit better than memset. 
because memset use the AVX2/SSE, too. Of course, it will use AVX512 on future machine.

>For small length(s) price of extra function would probably overweight any
 >potential gain.  
This is the key point. I think it should include the scalar optimization, not only vector optimization.

The value of rte_memset is always inlined and for small lengths it will be better.
when in some case We are not sure that memset is always inlined by compiler.
It seems that choosing function at run time will lose the gains.
The following is tested on haswell by patch code.
** rte_memset() - memset perf tests
        (C = compile-time constant) **
======== ======= ======== ======= ========
   Size memset in cache  memset in mem
(bytes)        (ticks)        (ticks)
------- -------------- ---------------
============= 32B aligned ================
      3            3 -    8       19 -  128
      4            4 -    8       13 -  128
      8            2 -    7       19 -  128
      9            2 -    7       19 -  127
     12           2 -    7       19 -  127
     17          3 -    8        19 -  132
     64          3 -    8        28 -  168
    128        7 -   13       54 -  200
    255        8 -   20       100 -  223
    511        14 -   20     187 -  314
   1024      24 -   29     328 -  379
   8192     198 -  225   1829 - 2193

Thanks
Zhiyong


  reply	other threads:[~2016-12-16  2:15 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05  8:26 [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test Zhiyong Yang
2016-12-02 10:00 ` Maxime Coquelin
2016-12-06  6:33   ` Yang, Zhiyong
2016-12-06  8:29     ` Maxime Coquelin
2016-12-07  9:28       ` Yang, Zhiyong
2016-12-07  9:37         ` Yuanhan Liu
2016-12-07  9:43           ` Yang, Zhiyong
2016-12-07  9:48             ` Yuanhan Liu
2016-12-05  8:26 ` [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform Zhiyong Yang
2016-12-02 10:25   ` Thomas Monjalon
2016-12-08  7:41     ` Yang, Zhiyong
2016-12-08  9:26       ` Ananyev, Konstantin
2016-12-08  9:53         ` Yang, Zhiyong
2016-12-08 10:27           ` Bruce Richardson
2016-12-08 10:30           ` Ananyev, Konstantin
2016-12-11 12:32             ` Yang, Zhiyong
2016-12-15  6:51               ` Yang, Zhiyong
2016-12-15 10:12                 ` Bruce Richardson
2016-12-16 10:19                   ` Yang, Zhiyong
2016-12-19  6:27                     ` Yuanhan Liu
2016-12-20  2:41                       ` Yao, Lei A
2016-12-15 10:53                 ` Ananyev, Konstantin
2016-12-16  2:15                   ` Yang, Zhiyong [this message]
2016-12-16 11:47                     ` Ananyev, Konstantin
2016-12-20  9:31                       ` Yang, Zhiyong
2016-12-08 15:09       ` Thomas Monjalon
2016-12-11 12:04         ` Yang, Zhiyong
2016-12-27 10:04   ` [dpdk-dev] [PATCH v2 0/4] eal/common: introduce rte_memset and related test Zhiyong Yang
2016-12-27 10:04     ` [dpdk-dev] [PATCH v2 1/4] eal/common: introduce rte_memset on IA platform Zhiyong Yang
2016-12-27 10:04     ` [dpdk-dev] [PATCH v2 2/4] app/test: add functional autotest for rte_memset Zhiyong Yang
2016-12-27 10:04     ` [dpdk-dev] [PATCH v2 3/4] app/test: add performance " Zhiyong Yang
2016-12-27 10:04     ` [dpdk-dev] [PATCH v2 4/4] lib/librte_vhost: improve vhost perf using rte_memset Zhiyong Yang
2017-01-09  9:48     ` [dpdk-dev] [PATCH v2 0/4] eal/common: introduce rte_memset and related test Yang, Zhiyong
2017-01-17  6:24       ` Yang, Zhiyong
2017-01-17 20:14         ` Thomas Monjalon
2017-01-18  0:15           ` Vincent JARDIN
2017-01-18  2:42           ` Yang, Zhiyong
2017-01-18  7:42             ` Thomas Monjalon
2017-01-19  1:36               ` Yang, Zhiyong
2016-12-05  8:26 ` [dpdk-dev] [PATCH 2/4] app/test: add functional autotest for rte_memset Zhiyong Yang
2016-12-05  8:26 ` [dpdk-dev] [PATCH 3/4] app/test: add performance " Zhiyong Yang
2016-12-05  8:26 ` [dpdk-dev] [PATCH 4/4] lib/librte_vhost: improve vhost perf using rte_memset Zhiyong Yang
2016-12-02  9:46   ` Thomas Monjalon
2016-12-06  8:04     ` Yang, Zhiyong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E182254E98A5DA4EB1E657AC7CB9BD2A3EB59CAF@BGSMSX101.gar.corp.intel.com \
    --to=zhiyong.yang@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=pablo.de.lara.guarch@intel.com \
    --cc=thomas.monjalon@6wind.com \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).