From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id A06FC36E for ; Thu, 8 Dec 2016 08:41:48 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 07 Dec 2016 23:41:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,318,1477983600"; d="scan'208";a="1079102536" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga001.fm.intel.com with ESMTP; 07 Dec 2016 23:41:47 -0800 Received: from fmsmsx102.amr.corp.intel.com (10.18.124.200) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 7 Dec 2016 23:41:47 -0800 Received: from BGSMSX107.gar.corp.intel.com (10.223.4.191) by FMSMSX102.amr.corp.intel.com (10.18.124.200) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 7 Dec 2016 23:41:46 -0800 Received: from bgsmsx101.gar.corp.intel.com ([169.254.1.222]) by BGSMSX107.gar.corp.intel.com ([169.254.9.164]) with mapi id 14.03.0248.002; Thu, 8 Dec 2016 13:11:43 +0530 From: "Yang, Zhiyong" To: Thomas Monjalon CC: "dev@dpdk.org" , "yuanhan.liu@linux.intel.com" , "Richardson, Bruce" , "Ananyev, Konstantin" , "De Lara Guarch, Pablo" Thread-Topic: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform Thread-Index: AQHSTHcq0cqfe4gXqkCtBCGLh5u1zaD0F6iAgAmXsfA= Date: Thu, 8 Dec 2016 07:41:43 +0000 Message-ID: References: <1480926387-63838-1-git-send-email-zhiyong.yang@intel.com> <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com> <7223515.9TZuZb6buy@xps13> In-Reply-To: <7223515.9TZuZb6buy@xps13> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODA3MTFlYTQtMjEzYi00NWQ4LTljYmEtNTRjYzliMjg0N2ViIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InZjU3hYak9IN3NqV3NJeHd0aGNmVHZYTVpPTTM5dlRTV1FhV2VzcXQxXC9zPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.223.10.10] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2016 07:41:49 -0000 HI, Thomas: Sorry for late reply. I have been being always considering your suggestion= .=20 > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] > Sent: Friday, December 2, 2016 6:25 PM > To: Yang, Zhiyong > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce > ; Ananyev, Konstantin > ; De Lara Guarch, Pablo > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on > IA platform >=20 > 2016-12-05 16:26, Zhiyong Yang: > > +#ifndef _RTE_MEMSET_X86_64_H_ >=20 > Is this implementation specific to 64-bit? >=20 Yes. > > + > > +#define rte_memset memset > > + > > +#else > > + > > +static void * > > +rte_memset(void *dst, int a, size_t n); > > + > > +#endif >=20 > If I understand well, rte_memset (as rte_memcpy) is using the most recent > instructions available (and enabled) when compiling. > It is not adapting the instructions to the run-time CPU. > There is no need to downgrade at run-time the instruction set as it is > obviously not a supported case, but it would be nice to be able to upgrad= e a > "default compilation" at run-time as it is done in rte_acl. > I explain this case more clearly for reference: >=20 > We can have AVX512 supported in the compiler but disable it when compilin= g > (CONFIG_RTE_MACHINE=3Dsnb) in order to build a binary running almost > everywhere. > When running this binary on a CPU having AVX512 support, it will not bene= fit > of the AVX512 improvement. > Though, we can compile an AVX512 version of some functions and use them > only if the running CPU is capable. > This kind of miracle can be achieved in two ways: >=20 > 1/ For generic C code compiled with a recent GCC, a function can be built= for > several CPUs thanks to the attribute target_clones. >=20 > 2/ For manually optimized functions using CPU-specific intrinsics or asm,= it is > possible to build them with non-default flags thanks to the attribute tar= get. >=20 > 3/ For manually optimized files using CPU-specific intrinsics or asm, we = use > specifics flags in the makefile. >=20 > The function clone in case 1/ is dynamically chosen at run-time through i= func > resolver. > The specific functions in cases 2/ and 3/ must chosen at run-time by > initializing a function pointer thanks to rte_cpu_get_flag_enabled(). >=20 > Note that rte_hash and software crypto PMDs have a run-time check with > rte_cpu_get_flag_enabled() but do not override CFLAGS in the Makefile. > Next step for these libraries? >=20 > Back to rte_memset, I think you should try the solution 2/. I have read the ACL code, if I understand well , for complex algo implement= ation, =20 it is good idea, but Choosing functions at run time will bring some overhea= d. For frequently called function Which consumes small cycles, the overhead maybe is more than the gains opt= imizations brings=20 For example, for most applications in dpdk, memset only set N =3D 10 or 12b= ytes. It consumes fewer cycles. Thanks Zhiyong