From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 149DD1B5EB for ; Wed, 18 Oct 2017 08:22:27 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2017 23:22:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.43,394,1503385200"; d="scan'208";a="1207032108" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga001.fm.intel.com with ESMTP; 17 Oct 2017 23:22:26 -0700 Received: from fmsmsx155.amr.corp.intel.com (10.18.116.71) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Oct 2017 23:22:26 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX155.amr.corp.intel.com (10.18.116.71) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Oct 2017 23:22:25 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002; Wed, 18 Oct 2017 14:22:24 +0800 From: "Li, Xiaoyun" To: "Li, Xiaoyun" , Thomas Monjalon , "Ananyev, Konstantin" , "Richardson, Bruce" CC: "dev@dpdk.org" , "Lu, Wenzhuo" , "Zhang, Helin" , "ophirmu@mellanox.com" Thread-Topic: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy Thread-Index: AQHTRAIprgY4BVeLI0+Q0HvX97iHcaLoDmwAgADN1xCAAE2qAA== Date: Wed, 18 Oct 2017 06:22:23 +0000 Message-ID: References: <1507206794-79941-1-git-send-email-xiaoyun.li@intel.com> <1507885309-165144-1-git-send-email-xiaoyun.li@intel.com> <1507885309-165144-2-git-send-email-xiaoyun.li@intel.com> <4482530.zMd2RtzCvC@xps> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Oct 2017 06:22:28 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Li, Xiaoyun > Sent: Wednesday, October 18, 2017 10:22 > To: Thomas Monjalon ; Ananyev, Konstantin > ; Richardson, Bruce > > Cc: dev@dpdk.org; Lu, Wenzhuo ; Zhang, Helin > ; ophirmu@mellanox.com > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over > memcpy >=20 > Hi >=20 > > -----Original Message----- > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > Sent: Wednesday, October 18, 2017 05:24 > > To: Li, Xiaoyun ; Ananyev, Konstantin > > ; Richardson, Bruce > > > > Cc: dev@dpdk.org; Lu, Wenzhuo ; Zhang, Helin > > ; ophirmu@mellanox.com > > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over > > memcpy > > > > Hi, > > > > 13/10/2017 11:01, Xiaoyun Li: > > > This patch dynamically selects functions of memcpy at run-time based > > > on CPU flags that current machine supports. This patch uses function > > > pointers which are bind to the relative functions at constrctor time. > > > In addition, AVX512 instructions set would be compiled only if users > > > config it enabled and the compiler supports it. > > > > > > Signed-off-by: Xiaoyun Li > > > --- > > Keeping only the major changes of the patch for later discussions: > > [...] > > > static inline void * > > > rte_memcpy(void *dst, const void *src, size_t n) { > > > - if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > > > - return rte_memcpy_aligned(dst, src, n); > > > + if (n <=3D RTE_X86_MEMCPY_THRESH) > > > + return rte_memcpy_internal(dst, src, n); > > > else > > > - return rte_memcpy_generic(dst, src, n); > > > + return (*rte_memcpy_ptr)(dst, src, n); > > > } > > [...] > > > +static inline void * > > > +rte_memcpy_internal(void *dst, const void *src, size_t n) { > > > + if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > > > + return rte_memcpy_aligned(dst, src, n); > > > + else > > > + return rte_memcpy_generic(dst, src, n); } > > > > The significant change of this patch is to call a function pointer for > > packet size > 128 (RTE_X86_MEMCPY_THRESH). > The perf drop is due to function call replacing inline. >=20 > > Please could you provide some benchmark numbers? > I ran memcpy_perf_test which would show the time cost of memcpy. I ran it > on broadwell with sse and avx2. > But I just draw pictures and looked at the trend not computed the exact > percentage. Sorry about that. > The picture shows results of copy size of 2, 4, 6, 8, 9, 12, 16, 32, 64, = 128, 192, > 256, 320, 384, 448, 512, 768, 1024, 1518, 1522, 1536, 1600, 2048, 2560, 3= 072, > 3584, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192. > In my test, the size grows, the drop degrades. (Using copy time indicates= the > perf.) From the trend picture, when the size is smaller than 128 bytes, t= he > perf drops a lot, almost 50%. And above 128 bytes, it approaches the orig= inal > dpdk. > I computed it right now, it shows that when greater than 128 bytes and > smaller than 1024 bytes, the perf drops about 15%. When above 1024 bytes, > the perf drops about 4%. >=20 > > From a test done at Mellanox, there might be a performance degradation > > of about 15% in testpmd txonly with AVX2. Another thing, I will test testpmd txonly with intel nics and mellanox thes= e days. And try adjusting the RTE_X86_MEMCPY_THRESH to see if there is any improvem= ent. > > Is there someone else seeing a performance degradation?