From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 70C851B248 for ; Tue, 3 Oct 2017 13:16:00 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Oct 2017 04:15:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,474,1500966000"; d="scan'208";a="1178110257" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga001.jf.intel.com with ESMTP; 03 Oct 2017 04:15:58 -0700 Received: from irsmsx103.ger.corp.intel.com ([169.254.3.49]) by IRSMSX153.ger.corp.intel.com ([169.254.9.34]) with mapi id 14.03.0319.002; Tue, 3 Oct 2017 12:15:57 +0100 From: "Ananyev, Konstantin" To: "Li, Xiaoyun" , "Richardson, Bruce" CC: "Lu, Wenzhuo" , "Zhang, Helin" , "dev@dpdk.org" Thread-Topic: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy Thread-Index: AQHTO5m/ev7m6EWN7Eqz5p6v1EkKxKLQwURAgABeHICAANm2MA== Date: Tue, 3 Oct 2017 11:15:57 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772585FAA35AA@IRSMSX103.ger.corp.intel.com> References: <1506411689-94690-1-git-send-email-xiaoyun.li@intel.com> <1506960796-71620-1-git-send-email-xiaoyun.li@intel.com> <1506960796-71620-2-git-send-email-xiaoyun.li@intel.com> <2601191342CEEE43887BDE71AB9772585FAA2F94@IRSMSX103.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYjBkMzkyYzgtZDVjOS00OTkwLTg3YjItMjU2MGFlYjNhNzRmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6ImRBZ25yQmdWQWl5YnFpcjNQazJaOXRxVTNhTGJCUXcxRUtGMUZtWHpxc0U9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2017 11:16:01 -0000 Hi, >=20 > Hi >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Tuesday, October 3, 2017 00:39 > > To: Li, Xiaoyun ; Richardson, Bruce > > > > Cc: Lu, Wenzhuo ; Zhang, Helin > > ; dev@dpdk.org > > Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > > > > -----Original Message----- > > > From: Li, Xiaoyun > > > Sent: Monday, October 2, 2017 5:13 PM > > > To: Ananyev, Konstantin ; Richardson, > > Bruce > > > Cc: Lu, Wenzhuo ; Zhang, Helin > > ; dev@dpdk.org; Li, Xiaoyun > > > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy > > > > > > This patch dynamically selects functions of memcpy at run-time based > > > on CPU flags that current machine supports. This patch uses function > > > pointers which are bind to the relative functions at constrctor time. > > > In addition, AVX512 instructions set would be compiled only if users > > > config it enabled and the compiler supports it. > > > > > > Signed-off-by: Xiaoyun Li > > > --- > > > v2 > > > * Use gcc function multi-versioning to avoid compilation issues. > > > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the > > > compiler supports it, the AVX512 codes would be compiled. Only if the > > > compiler supports AVX2, the AVX2 codes would be compiled. > > > > > > v3 > > > * Reduce function calls via only keep rte_memcpy_xxx. > > > * Add conditions that when copy size is small, use inline code path. > > > Otherwise, use dynamic code path. > > > * To support attribute target, clang version must be greater than 3.7= . > > > Otherwise, would choose SSE/AVX code path, the same as before. > > > * Move two mocro functions to the top of the code since they would be > > > used in inline SSE/AVX and dynamic SSE/AVX codes. > > > > > > v4 > > > * Modify rte_memcpy.h to several .c files and modify makefiles to com= pile > > > AVX2 and AVX512 files. > > > > Could you explain to me why instead of reusing existing rte_memcpy() co= de > > to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3 > > separate implementations? > > Obviously that is much more expensive in terms of maintenance and doesn= 't > > look like > > feasible solution to me. > > Is existing rte_memcpy() implementation is not good enough in terms of > > functionality and/or performance? > > If so, can you outline these problems and try to fix them first. > > Konstantin > > >=20 > I just change many small functions to one function in those 3 separate fu= nctions. Yes, so with what you suggest we'll have 4 implementations for rte_memcpy= to support. That's very expensive terms of maintenance and I believe totally unnecessar= y. > Because the existing codes are totally inline, including rte_memcpy() its= elf. So the compilation will > change all rte_memcpy() calls into the basic codes like xmm0=3Dxxx. >=20 > The existing codes in this way are OK.=20 Good. >But when run-time, it will bring lots of function calls > and cause perf drop. I believe it wouldn't if we do it properly. All internal functions (mov16, mov32, etc.) will still be unlined by the co= mpiler for each flavor (sse/avx2/etc.) - have a look at the patch I sent. Konstantin