From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id D3E991B253 for ; Wed, 18 Oct 2017 04:21:34 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2017 19:21:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.43,393,1503385200"; d="scan'208";a="1026274787" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga003.jf.intel.com with ESMTP; 17 Oct 2017 19:21:33 -0700 Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Oct 2017 19:21:32 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by fmsmsx115.amr.corp.intel.com (10.18.116.19) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Oct 2017 19:21:32 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002; Wed, 18 Oct 2017 10:21:31 +0800 From: "Li, Xiaoyun" To: Thomas Monjalon , "Ananyev, Konstantin" , "Richardson, Bruce" CC: "dev@dpdk.org" , "Lu, Wenzhuo" , "Zhang, Helin" , "ophirmu@mellanox.com" Thread-Topic: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy Thread-Index: AQHTRAIprgY4BVeLI0+Q0HvX97iHcaLoDmwAgADN1xA= Date: Wed, 18 Oct 2017 02:21:30 +0000 Message-ID: References: <1507206794-79941-1-git-send-email-xiaoyun.li@intel.com> <1507885309-165144-1-git-send-email-xiaoyun.li@intel.com> <1507885309-165144-2-git-send-email-xiaoyun.li@intel.com> <4482530.zMd2RtzCvC@xps> In-Reply-To: <4482530.zMd2RtzCvC@xps> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Oct 2017 02:21:35 -0000 Hi > -----Original Message----- > From: Thomas Monjalon [mailto:thomas@monjalon.net] > Sent: Wednesday, October 18, 2017 05:24 > To: Li, Xiaoyun ; Ananyev, Konstantin > ; Richardson, Bruce > > Cc: dev@dpdk.org; Lu, Wenzhuo ; Zhang, Helin > ; ophirmu@mellanox.com > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over > memcpy >=20 > Hi, >=20 > 13/10/2017 11:01, Xiaoyun Li: > > This patch dynamically selects functions of memcpy at run-time based > > on CPU flags that current machine supports. This patch uses function > > pointers which are bind to the relative functions at constrctor time. > > In addition, AVX512 instructions set would be compiled only if users > > config it enabled and the compiler supports it. > > > > Signed-off-by: Xiaoyun Li > > --- > Keeping only the major changes of the patch for later discussions: > [...] > > static inline void * > > rte_memcpy(void *dst, const void *src, size_t n) { > > - if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > > - return rte_memcpy_aligned(dst, src, n); > > + if (n <=3D RTE_X86_MEMCPY_THRESH) > > + return rte_memcpy_internal(dst, src, n); > > else > > - return rte_memcpy_generic(dst, src, n); > > + return (*rte_memcpy_ptr)(dst, src, n); > > } > [...] > > +static inline void * > > +rte_memcpy_internal(void *dst, const void *src, size_t n) { > > + if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > > + return rte_memcpy_aligned(dst, src, n); > > + else > > + return rte_memcpy_generic(dst, src, n); } >=20 > The significant change of this patch is to call a function pointer for pa= cket > size > 128 (RTE_X86_MEMCPY_THRESH).=20 The perf drop is due to function call replacing inline. > Please could you provide some benchmark numbers? I ran memcpy_perf_test which would show the time cost of memcpy. I ran it o= n broadwell with sse and avx2. But I just draw pictures and looked at the trend not computed the exact per= centage. Sorry about that. The picture shows results of copy size of 2, 4, 6, 8, 9, 12, 16, 32, 64, 12= 8, 192, 256, 320, 384, 448, 512, 768, 1024, 1518, 1522, 1536, 1600, 2048, 2= 560, 3072, 3584, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192. In my test, the size grows, the drop degrades. (Using copy time indicates t= he perf.) >>From the trend picture, when the size is smaller than 128 bytes, the perf d= rops a lot, almost 50%. And above 128 bytes, it approaches the original dpd= k. I computed it right now, it shows that when greater than 128 bytes and smal= ler than 1024 bytes, the perf drops about 15%. When above 1024 bytes, the p= erf drops about 4%. > From a test done at Mellanox, there might be a performance degradation of > about 15% in testpmd txonly with AVX2. > Is there someone else seeing a performance degradation?