From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 0AF0C282 for ; Fri, 16 Dec 2016 11:19:48 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP; 16 Dec 2016 02:19:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,357,1477983600"; d="scan'208";a="203269691" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga004.fm.intel.com with ESMTP; 16 Dec 2016 02:19:47 -0800 Received: from fmsmsx157.amr.corp.intel.com (10.18.116.73) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 16 Dec 2016 02:19:47 -0800 Received: from bgsmsx152.gar.corp.intel.com (10.224.48.50) by FMSMSX157.amr.corp.intel.com (10.18.116.73) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 16 Dec 2016 02:19:46 -0800 Received: from bgsmsx101.gar.corp.intel.com ([169.254.1.222]) by BGSMSX152.gar.corp.intel.com ([169.254.6.233]) with mapi id 14.03.0248.002; Fri, 16 Dec 2016 15:49:44 +0530 From: "Yang, Zhiyong" To: "Richardson, Bruce" CC: "Ananyev, Konstantin" , Thomas Monjalon , "dev@dpdk.org" , "yuanhan.liu@linux.intel.com" , "De Lara Guarch, Pablo" Thread-Topic: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform Thread-Index: AQHSTHcq0cqfe4gXqkCtBCGLh5u1zaD0F6iAgAmXsfD//8XGgIAAXKoA//+1VQCABS9VkIAF58Lw///kNYAAPcu40A== Date: Fri, 16 Dec 2016 10:19:43 +0000 Message-ID: References: <1480926387-63838-1-git-send-email-zhiyong.yang@intel.com> <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com> <7223515.9TZuZb6buy@xps13> <2601191342CEEE43887BDE71AB9772583F0E55B0@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F0E568B@irsmsx105.ger.corp.intel.com> <20161215101242.GA125588@bricha3-MOBL3.ger.corp.intel.com> In-Reply-To: <20161215101242.GA125588@bricha3-MOBL3.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTA5MTllYzItMDY5ZC00NGNmLWE0NmUtZGMxYTZmZTMxYmU4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjVSVkZramRwTHNTRTg0YXY1THA5cE9KNEYxV2pxTUsyOGpUdEdPSmZRdVk9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.223.10.10] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2016 10:19:49 -0000 Hi, Bruce: > -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, December 15, 2016 6:13 PM > To: Yang, Zhiyong > Cc: Ananyev, Konstantin ; Thomas > Monjalon ; dev@dpdk.org; > yuanhan.liu@linux.intel.com; De Lara Guarch, Pablo > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > IA platform >=20 > On Thu, Dec 15, 2016 at 06:51:08AM +0000, Yang, Zhiyong wrote: > > Hi, Thomas, Konstantin: > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong > > > Sent: Sunday, December 11, 2016 8:33 PM > > > To: Ananyev, Konstantin ; Thomas > > > Monjalon > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > ; De Lara Guarch, Pablo > > > > > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce > rte_memset > > > on IA platform > > > > > > Hi, Konstantin, Bruce: > > > > > > > -----Original Message----- > > > > From: Ananyev, Konstantin > > > > Sent: Thursday, December 8, 2016 6:31 PM > > > > To: Yang, Zhiyong ; Thomas Monjalon > > > > > > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > > ; De Lara Guarch, Pablo > > > > > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce > > > > rte_memset on IA platform > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yang, Zhiyong > > > > > Sent: Thursday, December 8, 2016 9:53 AM > > > > > To: Ananyev, Konstantin ; Thomas > > > > > Monjalon > > > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > > > ; De Lara Guarch, Pablo > > > > > > > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce > > > > > rte_memset on IA platform > > > > > > > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n); > > > > > > > > static inline void* > > > > rte_memset_huge(void *s, int c, size_t n) { > > > > return __rte_memset_vector(s, c, n); } > > > > > > > > static inline void * > > > > rte_memset(void *s, int c, size_t n) { > > > > If (n < XXX) > > > > return rte_memset_scalar(s, c, n); > > > > else > > > > return rte_memset_huge(s, c, n); } > > > > > > > > XXX could be either a define, or could also be a variable, so it > > > > can be setuped at startup, depending on the architecture. > > > > > > > > Would that work? > > > > Konstantin > > > > > > I have implemented the code for choosing the functions at run time. > > rte_memcpy is used more frequently, So I test it at run time. > > > > typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, > > size_t n); extern rte_memcpy_vector_t rte_memcpy_vector; static inline > > void * rte_memcpy(void *dst, const void *src, size_t n) { > > return rte_memcpy_vector(dst, src, n); } In order to reduce > > the overhead at run time, I assign the function address to var > > rte_memcpy_vector before main() starts to init the var. > > > > static void __attribute__((constructor)) > > rte_memcpy_init(void) > > { > > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > > { > > rte_memcpy_vector =3D rte_memcpy_avx2; > > } > > else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) > > { > > rte_memcpy_vector =3D rte_memcpy_sse; > > } > > else > > { > > rte_memcpy_vector =3D memcpy; > > } > > > > } > > I run the same virtio/vhost loopback tests without NIC. > > I can see the throughput drop when running choosing functions at run > > time compared to original code as following on the same platform(my > machine is haswell) > > Packet size perf drop > > 64 -4% > > 256 -5.4% > > 1024 -5% > > 1500 -2.5% > > Another thing, I run the memcpy_perf_autotest, when N=3D <128, the > > rte_memcpy perf gains almost disappears When choosing functions at run > > time. For N=3Dother numbers, the perf gains will become narrow. > > > How narrow. How significant is the improvement that we gain from having t= o > maintain our own copy of memcpy. If the libc version is nearly as good we > should just use that. >=20 > /Bruce Zhihong sent a patch about rte_memcpy, From the patch, =20 we can see the optimization job for memcpy will bring obvious perf improvem= ents than glibc for DPDK. http://www.dpdk.org/dev/patchwork/patch/17753/ git log as following: This patch is tested on Ivy Bridge, Haswell and Skylake, it provides up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging from 64 to 1500 bytes. thanks Zhiyong