From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <zhiyong.yang@intel.com> Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 2AF7F36E for <dev@dpdk.org>; Thu, 15 Dec 2016 07:51:13 +0100 (CET) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP; 14 Dec 2016 22:51:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,350,1477983600"; d="scan'208";a="40136569" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga004.jf.intel.com with ESMTP; 14 Dec 2016 22:51:12 -0800 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 14 Dec 2016 22:51:12 -0800 Received: from BGSMSX107.gar.corp.intel.com (10.223.4.191) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 14 Dec 2016 22:51:11 -0800 Received: from bgsmsx101.gar.corp.intel.com ([169.254.1.222]) by BGSMSX107.gar.corp.intel.com ([169.254.9.164]) with mapi id 14.03.0248.002; Thu, 15 Dec 2016 12:21:08 +0530 From: "Yang, Zhiyong" <zhiyong.yang@intel.com> To: "Yang, Zhiyong" <zhiyong.yang@intel.com>, "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, Thomas Monjalon <thomas.monjalon@6wind.com> CC: "dev@dpdk.org" <dev@dpdk.org>, "yuanhan.liu@linux.intel.com" <yuanhan.liu@linux.intel.com>, "Richardson, Bruce" <bruce.richardson@intel.com>, "De Lara Guarch, Pablo" <pablo.de.lara.guarch@intel.com> Thread-Topic: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform Thread-Index: AQHSTHcq0cqfe4gXqkCtBCGLh5u1zaD0F6iAgAmXsfD//8XGgIAAXKoA//+1VQCABS9VkIAF58Lw Date: Thu, 15 Dec 2016 06:51:08 +0000 Message-ID: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB599D4@BGSMSX101.gar.corp.intel.com> References: <1480926387-63838-1-git-send-email-zhiyong.yang@intel.com> <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com> <7223515.9TZuZb6buy@xps13> <E182254E98A5DA4EB1E657AC7CB9BD2A3EB565EC@BGSMSX101.gar.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F0E55B0@irsmsx105.ger.corp.intel.com> <E182254E98A5DA4EB1E657AC7CB9BD2A3EB586ED@BGSMSX101.gar.corp.intel.com> <2601191342CEEE43887BDE71AB9772583F0E568B@irsmsx105.ger.corp.intel.com> <E182254E98A5DA4EB1E657AC7CB9BD2A3EB58E90@BGSMSX101.gar.corp.intel.com> In-Reply-To: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB58E90@BGSMSX101.gar.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNGU0NGM0NjUtY2NiMS00ZmM5LWIzY2UtMzQ5Yzc1NWQ3NzY5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImFaNjdNZ3R6SmVhbnJSVVNJNWVQdlVYeDlCM3NnRFlXSktWUlVIVlNKQjg9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.223.10.10] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> X-List-Received-Date: Thu, 15 Dec 2016 06:51:14 -0000 Hi, Thomas, Konstantin: > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong > Sent: Sunday, December 11, 2016 8:33 PM > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas > Monjalon <thomas.monjalon@6wind.com> > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com> > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > IA platform >=20 > Hi, Konstantin, Bruce: >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Thursday, December 8, 2016 6:31 PM > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon > > <thomas.monjalon@6wind.com> > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > > <pablo.de.lara.guarch@intel.com> > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset > > on IA platform > > > > > > > > > -----Original Message----- > > > From: Yang, Zhiyong > > > Sent: Thursday, December 8, 2016 9:53 AM > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas > > > Monjalon <thomas.monjalon@6wind.com> > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > > > <pablo.de.lara.guarch@intel.com> > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset > > > on IA platform > > > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n); > > > > static inline void* > > rte_memset_huge(void *s, int c, size_t n) { > > return __rte_memset_vector(s, c, n); } > > > > static inline void * > > rte_memset(void *s, int c, size_t n) > > { > > If (n < XXX) > > return rte_memset_scalar(s, c, n); > > else > > return rte_memset_huge(s, c, n); > > } > > > > XXX could be either a define, or could also be a variable, so it can > > be setuped at startup, depending on the architecture. > > > > Would that work? > > Konstantin > > I have implemented the code for choosing the functions at run time. rte_memcpy is used more frequently, So I test it at run time.=20 typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n); extern rte_memcpy_vector_t rte_memcpy_vector; static inline void * rte_memcpy(void *dst, const void *src, size_t n) { return rte_memcpy_vector(dst, src, n); } In order to reduce the overhead at run time,=20 I assign the function address to var rte_memcpy_vector before main() starts= to init the var. static void __attribute__((constructor)) rte_memcpy_init(void) { if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) { rte_memcpy_vector =3D rte_memcpy_avx2; } else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1)) { rte_memcpy_vector =3D rte_memcpy_sse; } else { rte_memcpy_vector =3D memcpy; } } I run the same virtio/vhost loopback tests without NIC. I can see the throughput drop when running choosing functions at run time compared to original code as following on the same platform(my machine is h= aswell)=20 Packet size perf drop 64 -4% 256 -5.4% 1024 -5% 1500 -2.5% Another thing, I run the memcpy_perf_autotest, when N=3D <128,=20 the rte_memcpy perf gains almost disappears When choosing functions at run time. For N=3Dother numbers, the perf gains= will become narrow. Thanks Zhiyong