From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 0F5622BAA for ; Thu, 5 Oct 2017 00:33:47 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP; 04 Oct 2017 15:33:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,478,1500966000"; d="scan'208";a="1178745353" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by orsmga001.jf.intel.com with ESMTP; 04 Oct 2017 15:33:45 -0700 Received: from fmsmsx120.amr.corp.intel.com (10.18.124.208) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 4 Oct 2017 15:33:45 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx120.amr.corp.intel.com (10.18.124.208) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 4 Oct 2017 15:33:45 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.98]) with mapi id 14.03.0319.002; Thu, 5 Oct 2017 06:33:43 +0800 From: "Li, Xiaoyun" To: "Ananyev, Konstantin" , "Richardson, Bruce" CC: "Lu, Wenzhuo" , "Zhang, Helin" , "dev@dpdk.org" Thread-Topic: [PATCH v5 0/3] run-time Linking support Thread-Index: AQHTPFiL+uyHcO4exEWX6lfwL/7xHqLTdVmAgADSbzA= Date: Wed, 4 Oct 2017 22:33:42 +0000 Message-ID: References: <1506960796-71620-1-git-send-email-xiaoyun.li@intel.com> <1507042796-86318-1-git-send-email-xiaoyun.li@intel.com> <2601191342CEEE43887BDE71AB9772585FAA4014@IRSMSX103.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAA4014@IRSMSX103.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjEwZTk0MDYtNWFiMC00M2IyLWFmZGEtMGYzNzU5Y2JjYTg0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkltbkV5b1NxOVJ4VEZObmJlbWhMVlNiNUdFRStWWkc5YnBaZkFWaTNjV3M9In0= x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v5 0/3] run-time Linking support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2017 22:33:48 -0000 OK. Will send it later. Many thanks! > -----Original Message----- > From: Ananyev, Konstantin > Sent: Thursday, October 5, 2017 01:56 > To: Li, Xiaoyun ; Richardson, Bruce > > Cc: Lu, Wenzhuo ; Zhang, Helin > ; dev@dpdk.org > Subject: RE: [PATCH v5 0/3] run-time Linking support >=20 > Hi Xiaouyn, >=20 > > -----Original Message----- > > From: Li, Xiaoyun > > Sent: Tuesday, October 3, 2017 4:00 PM > > To: Ananyev, Konstantin ; Richardson, > > Bruce > > Cc: Lu, Wenzhuo ; Zhang, Helin > > ; dev@dpdk.org; Li, Xiaoyun > > > > Subject: [PATCH v5 0/3] run-time Linking support > > > > This patchset dynamically selects functions at run-time based on CPU > > flags that current machine supports.This patchset modifies mempcy, > > memcpy perf test and x86 EFD, using function pointers and bind them at > constructor time. > > Then in the cloud environment, users can compiler once for the minimum > > target such as 'haswell'(not 'native') and run on different platforms > > (equal or above > > haswell) and can get ISA optimization based on running CPU. > > > > Xiaoyun Li (3): > > eal/x86: run-time dispatch over memcpy > > app/test: run-time dispatch over memcpy perf test > > efd: run-time dispatch over x86 EFD functions > > > > --- > > v2 > > * Use gcc function multi-versioning to avoid compilation issues. > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the > > compiler supports it, the AVX512 codes would be compiled. Only if the > > compiler supports AVX2, the AVX2 codes would be compiled. > > > > v3 > > * Reduce function calls via only keep rte_memcpy_xxx. > > * Add conditions that when copy size is small, use inline code path. > > Otherwise, use dynamic code path. > > * To support attribute target, clang version must be greater than 3.7. > > Otherwise, would choose SSE/AVX code path, the same as before. > > * Move two mocro functions to the top of the code since they would be > > used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > v4 > > * Modify rte_memcpy.h to several .c files and modify makefiles to > > compile > > AVX2 and AVX512 files. > > > > v5 > > * Delete redundant repeated codes of rte_memcpy_xxx. > > * Modify makefiles to enable reuse of existing rte_memcpy. > > * Delete redundant codes of rte_efd_x86.h in v4. Move it into .c file > > and enable compilation -mavx2 for it in makefile since it is already ch= osen > at run-time. > > >=20 > Generally looks good, just two things to fix below. > Konstantin >=20 > 1. [dpdk-dev,v5,1/3] eal/x86: run-time dispatch over memcpy >=20 > Shared target build fails: > http://dpdk.org/ml/archives/test-report/2017-October/031032.html >=20 > I think you need to include rte_memcpy_ptr into the: > lib/librte_eal/linuxapp/eal/rte_eal_version.map > lib/librte_eal/bsdapp/eal/rte_eal_version.map > to fix it. >=20 > 2. [dpdk-dev,v5,3/3] efd: run-time dispatch over x86 EFD functions >=20 > /lib/librte_efd/rte_efd_x86.c > .... > +efd_value_t > +efd_lookup_internal_avx2(const efd_hashfunc_t *group_hash_idx, > + const efd_lookuptbl_t *group_lookup_table, > + const uint32_t hash_val_a, const uint32_t hash_val_b) > { #ifdef > +CC_SUPPORT_AVX2 > + efd_value_t value =3D 0; > + uint32_t i =3D 0; > + __m256i vhash_val_a =3D _mm256_set1_epi32(hash_val_a); > + __m256i vhash_val_b =3D _mm256_set1_epi32(hash_val_b); > + > + for (; i < RTE_EFD_VALUE_NUM_BITS; i +=3D 8) { > + __m256i vhash_idx =3D > + _mm256_cvtepu16_epi32(EFD_LOAD_SI128( > + (__m128i const *) &group_hash_idx[i])); > + __m256i vlookup_table =3D _mm256_cvtepu16_epi32( > + EFD_LOAD_SI128((__m128i const *) > + &group_lookup_table[i])); > + __m256i vhash =3D _mm256_add_epi32(vhash_val_a, > + _mm256_mullo_epi32(vhash_idx, > vhash_val_b)); > + __m256i vbucket_idx =3D _mm256_srli_epi32(vhash, > + EFD_LOOKUPTBL_SHIFT); > + __m256i vresult =3D _mm256_srlv_epi32(vlookup_table, > + vbucket_idx); > + > + value |=3D (_mm256_movemask_ps( > + (__m256) _mm256_slli_epi32(vresult, 31)) > + & ((1 << (RTE_EFD_VALUE_NUM_BITS - i)) - 1)) << i; > + } > + > + return value; > +#else >=20 > We always build that file with AVX2 option, so I think we can safely rem= ove > The #ifdef CC_SUPPORT_AVX2 and the code below. >=20 > + RTE_SET_USED(group_hash_idx); > + RTE_SET_USED(group_lookup_table); > + RTE_SET_USED(hash_val_a); > + RTE_SET_USED(hash_val_b); > + /* Return dummy value, only to avoid compilation breakage */ > + return 0; > +#endif > + > +} >=20 >=20 > > lib/librte_eal/bsdapp/eal/Makefile | 19 + > > .../common/include/arch/x86/rte_memcpy.c | 59 ++ > > .../common/include/arch/x86/rte_memcpy.h | 861 +------------= ------ > > .../common/include/arch/x86/rte_memcpy_avx2.c | 44 + > > .../common/include/arch/x86/rte_memcpy_avx512f.c | 44 + > > .../common/include/arch/x86/rte_memcpy_internal.h | 909 > +++++++++++++++++++++ > > .../common/include/arch/x86/rte_memcpy_sse.c | 40 + > > lib/librte_eal/linuxapp/eal/Makefile | 19 + > > lib/librte_efd/Makefile | 6 + > > lib/librte_efd/rte_efd_x86.c | 87 ++ > > lib/librte_efd/rte_efd_x86.h | 48 +- > > mk/rte.cpuflags.mk | 14 + > > test/test/test_memcpy_perf.c | 40 +- > > 13 files changed, 1285 insertions(+), 905 deletions(-) create mode > > 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy.c > > create mode 100644 > > lib/librte_eal/common/include/arch/x86/rte_memcpy_avx2.c > > create mode 100644 > > lib/librte_eal/common/include/arch/x86/rte_memcpy_avx512f.c > > create mode 100644 > > lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h > > create mode 100644 > > lib/librte_eal/common/include/arch/x86/rte_memcpy_sse.c > > create mode 100644 lib/librte_efd/rte_efd_x86.c > > > > -- > > 2.7.4