From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xiaoyun.li@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id D3E991B253
 for <dev@dpdk.org>; Wed, 18 Oct 2017 04:21:34 +0200 (CEST)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 17 Oct 2017 19:21:33 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.43,393,1503385200"; d="scan'208";a="1026274787"
Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205])
 by orsmga003.jf.intel.com with ESMTP; 17 Oct 2017 19:21:33 -0700
Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by
 fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Tue, 17 Oct 2017 19:21:32 -0700
Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by
 fmsmsx115.amr.corp.intel.com (10.18.116.19) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Tue, 17 Oct 2017 19:21:32 -0700
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by
 SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002;
 Wed, 18 Oct 2017 10:21:31 +0800
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
 "Zhang, Helin" <helin.zhang@intel.com>, "ophirmu@mellanox.com"
 <ophirmu@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
Thread-Index: AQHTRAIprgY4BVeLI0+Q0HvX97iHcaLoDmwAgADN1xA=
Date: Wed, 18 Oct 2017 02:21:30 +0000
Message-ID: <B9E724F4CB7543449049E7AE7669D82F47F757@SHSMSX101.ccr.corp.intel.com>
References: <1507206794-79941-1-git-send-email-xiaoyun.li@intel.com>
 <1507885309-165144-1-git-send-email-xiaoyun.li@intel.com>
 <1507885309-165144-2-git-send-email-xiaoyun.li@intel.com>
 <4482530.zMd2RtzCvC@xps>
In-Reply-To: <4482530.zMd2RtzCvC@xps>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Oct 2017 02:21:35 -0000

Hi

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Wednesday, October 18, 2017 05:24
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>; ophirmu@mellanox.com
> Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over
> memcpy
>=20
> Hi,
>=20
> 13/10/2017 11:01, Xiaoyun Li:
> > This patch dynamically selects functions of memcpy at run-time based
> > on CPU flags that current machine supports. This patch uses function
> > pointers which are bind to the relative functions at constrctor time.
> > In addition, AVX512 instructions set would be compiled only if users
> > config it enabled and the compiler supports it.
> >
> > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > ---
> Keeping only the major changes of the patch for later discussions:
> [...]
> >  static inline void *
> >  rte_memcpy(void *dst, const void *src, size_t n)  {
> > -	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> > -		return rte_memcpy_aligned(dst, src, n);
> > +	if (n <=3D RTE_X86_MEMCPY_THRESH)
> > +		return rte_memcpy_internal(dst, src, n);
> >  	else
> > -		return rte_memcpy_generic(dst, src, n);
> > +		return (*rte_memcpy_ptr)(dst, src, n);
> >  }
> [...]
> > +static inline void *
> > +rte_memcpy_internal(void *dst, const void *src, size_t n) {
> > +	if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
> > +		return rte_memcpy_aligned(dst, src, n);
> > +	else
> > +		return rte_memcpy_generic(dst, src, n); }
>=20
> The significant change of this patch is to call a function pointer for pa=
cket
> size > 128 (RTE_X86_MEMCPY_THRESH).=20
The perf drop is due to function call replacing inline.

> Please could you provide some benchmark numbers?
I ran memcpy_perf_test which would show the time cost of memcpy. I ran it o=
n broadwell with sse and avx2.
But I just draw pictures and looked at the trend not computed the exact per=
centage. Sorry about that.
The picture shows results of copy size of 2, 4, 6, 8, 9, 12, 16, 32, 64, 12=
8, 192, 256, 320, 384, 448, 512, 768, 1024, 1518, 1522, 1536, 1600, 2048, 2=
560, 3072, 3584, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192.
In my test, the size grows, the drop degrades. (Using copy time indicates t=
he perf.)
>>From the trend picture, when the size is smaller than 128 bytes, the perf d=
rops a lot, almost 50%. And above 128 bytes, it approaches the original dpd=
k.
I computed it right now, it shows that when greater than 128 bytes and smal=
ler than 1024 bytes, the perf drops about 15%. When above 1024 bytes, the p=
erf drops about 4%.

> From a test done at Mellanox, there might be a performance degradation of
> about 15% in testpmd txonly with AVX2.
> Is there someone else seeing a performance degradation?