From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xiaoyun.li@intel.com>
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 7F57927D
 for <dev@dpdk.org>; Thu, 19 Oct 2017 04:45:30 +0200 (CEST)
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
 by fmsmga105.fm.intel.com with ESMTP; 18 Oct 2017 19:45:29 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.43,399,1503385200"; d="scan'208";a="164324917"
Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203])
 by fmsmga005.fm.intel.com with ESMTP; 18 Oct 2017 19:45:29 -0700
Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by
 FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 18 Oct 2017 19:45:29 -0700
Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by
 fmsmsx115.amr.corp.intel.com (10.18.116.19) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 18 Oct 2017 19:45:28 -0700
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.159]) by
 SHSMSX152.ccr.corp.intel.com ([169.254.6.93]) with mapi id 14.03.0319.002;
 Thu, 19 Oct 2017 10:45:26 +0800
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
 "Zhang, Helin" <helin.zhang@intel.com>, "ophirmu@mellanox.com"
 <ophirmu@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
Thread-Index: AQHTRAIprgY4BVeLI0+Q0HvX97iHcaLoDmwAgADN1xCAAE2qAIABU4GQ
Date: Thu, 19 Oct 2017 02:45:26 +0000
Message-ID: <B9E724F4CB7543449049E7AE7669D82F47FD6D@SHSMSX101.ccr.corp.intel.com>
References: <1507206794-79941-1-git-send-email-xiaoyun.li@intel.com>
 <1507885309-165144-1-git-send-email-xiaoyun.li@intel.com>
 <1507885309-165144-2-git-send-email-xiaoyun.li@intel.com>
 <4482530.zMd2RtzCvC@xps>
 <B9E724F4CB7543449049E7AE7669D82F47F757@SHSMSX101.ccr.corp.intel.com>
 <B9E724F4CB7543449049E7AE7669D82F47F814@SHSMSX101.ccr.corp.intel.com>
In-Reply-To: <B9E724F4CB7543449049E7AE7669D82F47F814@SHSMSX101.ccr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Oct 2017 02:45:30 -0000

Hi
> > >
> > > The significant change of this patch is to call a function pointer
> > > for packet size > 128 (RTE_X86_MEMCPY_THRESH).
> > The perf drop is due to function call replacing inline.
> >
> > > Please could you provide some benchmark numbers?
> > I ran memcpy_perf_test which would show the time cost of memcpy. I ran
> > it on broadwell with sse and avx2.
> > But I just draw pictures and looked at the trend not computed the
> > exact percentage. Sorry about that.
> > The picture shows results of copy size of 2, 4, 6, 8, 9, 12, 16, 32,
> > 64, 128, 192, 256, 320, 384, 448, 512, 768, 1024, 1518, 1522, 1536,
> > 1600, 2048, 2560, 3072, 3584, 4096, 4608, 5120, 5632, 6144, 6656, 7168,
> 7680, 8192.
> > In my test, the size grows, the drop degrades. (Using copy time
> > indicates the
> > perf.) From the trend picture, when the size is smaller than 128
> > bytes, the perf drops a lot, almost 50%. And above 128 bytes, it
> > approaches the original dpdk.
> > I computed it right now, it shows that when greater than 128 bytes and
> > smaller than 1024 bytes, the perf drops about 15%. When above 1024
> > bytes, the perf drops about 4%.
> >
> > > From a test done at Mellanox, there might be a performance
> > > degradation of about 15% in testpmd txonly with AVX2.
>=20

I did tests on X710, XXV710, X540 and MT27710 but didn't see performance de=
gradation.

I used command "./x86_64-native-linuxapp-gcc/app/testpmd -c 0xf -n 4 -- -I"=
 and set fwd txonly.=20
I tested it on v17.11-rc1, then revert my patch and tested it again.
Show port stats all and see the throughput pps. But the results are similar=
 and no drop.

Did I miss something?

> Another thing, I will test testpmd txonly with intel nics and mellanox th=
ese
> days.
> And try adjusting the RTE_X86_MEMCPY_THRESH to see if there is any
> improvement.
>=20
> > > Is there someone else seeing a performance degradation?