From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id A29DF1B29E for ; Tue, 3 Oct 2017 01:10:51 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP; 02 Oct 2017 16:10:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,471,1500966000"; d="scan'208";a="1177947114" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga001.jf.intel.com with ESMTP; 02 Oct 2017 16:10:50 -0700 Received: from fmsmsx125.amr.corp.intel.com (10.18.125.40) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 2 Oct 2017 16:10:46 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX125.amr.corp.intel.com (10.18.125.40) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 2 Oct 2017 16:10:46 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002; Tue, 3 Oct 2017 07:10:44 +0800 From: "Li, Xiaoyun" To: "Ananyev, Konstantin" , "Richardson, Bruce" CC: "Lu, Wenzhuo" , "Zhang, Helin" , "dev@dpdk.org" Thread-Topic: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy Thread-Index: AQHTO5m58EawVo0vnkiZ+SCua4XO0qLQPJeAgADxDEA= Date: Mon, 2 Oct 2017 23:10:43 +0000 Message-ID: References: <1506411689-94690-1-git-send-email-xiaoyun.li@intel.com> <1506960796-71620-1-git-send-email-xiaoyun.li@intel.com> <1506960796-71620-2-git-send-email-xiaoyun.li@intel.com> <2601191342CEEE43887BDE71AB9772585FAA2F94@IRSMSX103.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAA2F94@IRSMSX103.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYjBkMzkyYzgtZDVjOS00OTkwLTg3YjItMjU2MGFlYjNhNzRmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6ImRBZ25yQmdWQWl5YnFpcjNQazJaOXRxVTNhTGJCUXcxRUtGMUZtWHpxc0U9In0= x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Oct 2017 23:10:52 -0000 Hi > -----Original Message----- > From: Ananyev, Konstantin > Sent: Tuesday, October 3, 2017 00:39 > To: Li, Xiaoyun ; Richardson, Bruce > > Cc: Lu, Wenzhuo ; Zhang, Helin > ; dev@dpdk.org > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy >=20 >=20 >=20 > > -----Original Message----- > > From: Li, Xiaoyun > > Sent: Monday, October 2, 2017 5:13 PM > > To: Ananyev, Konstantin ; Richardson, > Bruce > > Cc: Lu, Wenzhuo ; Zhang, Helin > ; dev@dpdk.org; Li, Xiaoyun > > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > This patch dynamically selects functions of memcpy at run-time based > > on CPU flags that current machine supports. This patch uses function > > pointers which are bind to the relative functions at constrctor time. > > In addition, AVX512 instructions set would be compiled only if users > > config it enabled and the compiler supports it. > > > > Signed-off-by: Xiaoyun Li > > --- > > v2 > > * Use gcc function multi-versioning to avoid compilation issues. > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the > > compiler supports it, the AVX512 codes would be compiled. Only if the > > compiler supports AVX2, the AVX2 codes would be compiled. > > > > v3 > > * Reduce function calls via only keep rte_memcpy_xxx. > > * Add conditions that when copy size is small, use inline code path. > > Otherwise, use dynamic code path. > > * To support attribute target, clang version must be greater than 3.7. > > Otherwise, would choose SSE/AVX code path, the same as before. > > * Move two mocro functions to the top of the code since they would be > > used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > v4 > > * Modify rte_memcpy.h to several .c files and modify makefiles to compi= le > > AVX2 and AVX512 files. >=20 > Could you explain to me why instead of reusing existing rte_memcpy() code > to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3 > separate implementations? > Obviously that is much more expensive in terms of maintenance and doesn't > look like > feasible solution to me. > Is existing rte_memcpy() implementation is not good enough in terms of > functionality and/or performance? > If so, can you outline these problems and try to fix them first. > Konstantin >=20 I just change many small functions to one function in those 3 separate func= tions. Because the existing codes are totally inline, including rte_memcpy() itsel= f. So the compilation will=20 change all rte_memcpy() calls into the basic codes like xmm0=3Dxxx. The existing codes in this way are OK. But when run-time, it will bring lot= s of function calls and cause perf drop. Best Regards, Xiaoyun Li =20