From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id AFE9A2BFF for ; Thu, 15 Dec 2016 11:12:47 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP; 15 Dec 2016 02:12:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,351,1477983600"; d="scan'208";a="798262801" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.64]) by FMSMGA003.fm.intel.com with SMTP; 15 Dec 2016 02:12:43 -0800 Received: by (sSMTP sendmail emulation); Thu, 15 Dec 2016 10:12:43 +0000 Date: Thu, 15 Dec 2016 10:12:43 +0000 From: Bruce Richardson To: "Yang, Zhiyong" Cc: "Ananyev, Konstantin" , Thomas Monjalon , "dev@dpdk.org" , "yuanhan.liu@linux.intel.com" , "De Lara Guarch, Pablo" Message-ID: <20161215101242.GA125588@bricha3-MOBL3.ger.corp.intel.com> References: <1480926387-63838-1-git-send-email-zhiyong.yang@intel.com> <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com> <7223515.9TZuZb6buy@xps13> <2601191342CEEE43887BDE71AB9772583F0E55B0@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F0E568B@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.7.1 (2016-10-04) Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2016 10:12:48 -0000 On Thu, Dec 15, 2016 at 06:51:08AM +0000, Yang, Zhiyong wrote: > Hi, Thomas, Konstantin: > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong > > Sent: Sunday, December 11, 2016 8:33 PM > > To: Ananyev, Konstantin ; Thomas > > Monjalon > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > ; De Lara Guarch, Pablo > > > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > > IA platform > > > > Hi, Konstantin, Bruce: > > > > > -----Original Message----- > > > From: Ananyev, Konstantin > > > Sent: Thursday, December 8, 2016 6:31 PM > > > To: Yang, Zhiyong ; Thomas Monjalon > > > > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > ; De Lara Guarch, Pablo > > > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset > > > on IA platform > > > > > > > > > > > > > -----Original Message----- > > > > From: Yang, Zhiyong > > > > Sent: Thursday, December 8, 2016 9:53 AM > > > > To: Ananyev, Konstantin ; Thomas > > > > Monjalon > > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > > ; De Lara Guarch, Pablo > > > > > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset > > > > on IA platform > > > > > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n); > > > > > > static inline void* > > > rte_memset_huge(void *s, int c, size_t n) { > > > return __rte_memset_vector(s, c, n); } > > > > > > static inline void * > > > rte_memset(void *s, int c, size_t n) > > > { > > > If (n < XXX) > > > return rte_memset_scalar(s, c, n); > > > else > > > return rte_memset_huge(s, c, n); > > > } > > > > > > XXX could be either a define, or could also be a variable, so it can > > > be setuped at startup, depending on the architecture. > > > > > > Would that work? > > > Konstantin > > > > I have implemented the code for choosing the functions at run time. > rte_memcpy is used more frequently, So I test it at run time. > > typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n); > extern rte_memcpy_vector_t rte_memcpy_vector; > static inline void * > rte_memcpy(void *dst, const void *src, size_t n) > { > return rte_memcpy_vector(dst, src, n); > } > In order to reduce the overhead at run time, > I assign the function address to var rte_memcpy_vector before main() starts to init the var. > > static void __attribute__((constructor)) > rte_memcpy_init(void) > { > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > { > rte_memcpy_vector = rte_memcpy_avx2; > } > else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) > { > rte_memcpy_vector = rte_memcpy_sse; > } > else > { > rte_memcpy_vector = memcpy; > } > > } > I run the same virtio/vhost loopback tests without NIC. > I can see the throughput drop when running choosing functions at run time > compared to original code as following on the same platform(my machine is haswell) > Packet size perf drop > 64 -4% > 256 -5.4% > 1024 -5% > 1500 -2.5% > Another thing, I run the memcpy_perf_autotest, when N= <128, > the rte_memcpy perf gains almost disappears > When choosing functions at run time. For N=other numbers, the perf gains will become narrow. > How narrow. How significant is the improvement that we gain from having to maintain our own copy of memcpy. If the libc version is nearly as good we should just use that. /Bruce