From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xiaoyun.li@intel.com>
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by dpdk.org (Postfix) with ESMTP id 51DB52B84
 for <dev@dpdk.org>; Tue, 12 Sep 2017 04:27:10 +0200 (CEST)
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 11 Sep 2017 19:27:09 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.42,381,1500966000"; 
 d="xml'?bin'?scan'208,217,72,48?xlsx'208,217,72,48,72,48?rels'208,217,72,48,72,48";
 a="150734367"
Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205])
 by fmsmga006.fm.intel.com with ESMTP; 11 Sep 2017 19:27:09 -0700
Received: from fmsmsx123.amr.corp.intel.com (10.18.125.38) by
 fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Mon, 11 Sep 2017 19:27:09 -0700
Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by
 fmsmsx123.amr.corp.intel.com (10.18.125.38) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Mon, 11 Sep 2017 19:27:09 -0700
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.168]) by
 SHSMSX103.ccr.corp.intel.com ([169.254.4.219]) with mapi id 14.03.0319.002;
 Tue, 12 Sep 2017 10:27:06 +0800
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: "Wang, Liang-min" <liang-min.wang@intel.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>
CC: "Zhang, Qi Z" <qi.z.zhang@intel.com>, "Lu, Wenzhuo"
 <wenzhuo.lu@intel.com>, "Zhang, Helin" <helin.zhang@intel.com>,
 "pierre@emutex.com" <pierre@emutex.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over memcpy
Thread-Index: AQHTIwBrlGsHmgShskuUmMa7Bb4+wKKfOegAgACHy9D//484AIAEpU8AgAORwkCAABh6cIAAALbwgAKbI7CABkpjcA==
Date: Tue, 12 Sep 2017 02:27:05 +0000
Message-ID: <B9E724F4CB7543449049E7AE7669D82F443E45@SHSMSX101.ccr.corp.intel.com>
References: <1503626773-184682-1-git-send-email-xiaoyun.li@intel.com>
 <1504256222-32969-1-git-send-email-xiaoyun.li@intel.com>
 <1504256222-32969-2-git-send-email-xiaoyun.li@intel.com>
 <2601191342CEEE43887BDE71AB9772584F23F1AC@IRSMSX103.ger.corp.intel.com>
 <B9E724F4CB7543449049E7AE7669D82F440ED4@SHSMSX101.ccr.corp.intel.com>
 <2601191342CEEE43887BDE71AB9772584F23F281@IRSMSX103.ger.corp.intel.com>
 <B9E724F4CB7543449049E7AE7669D82F4412E3@SHSMSX101.ccr.corp.intel.com>
 <B9E724F4CB7543449049E7AE7669D82F44216E@SHSMSX101.ccr.corp.intel.com>
 <B9E724F4CB7543449049E7AE7669D82F442FE6@SHSMSX101.ccr.corp.intel.com>
In-Reply-To: <B9E724F4CB7543449049E7AE7669D82F442FE6@SHSMSX101.ccr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/x86: run-time dispatch over memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Sep 2017 02:27:12 -0000

Hi ALL

After investigating, most DPDK codes are already run-time dispatching. Only=
 rte_memcpy chooses the ISA at build-time.

To modify memcpy, there are two ways. The first one is function pointers an=
d another is function multi-versioning in GCC.

But memcpy has been greatly optimized and gets benefit from total inline. I=
f changing it to run-time dispatching via function pointers, the perf will =
drop a lot especially when copy size is small.

And function multi-versioning in GCC only works for C++. Even if it is said=
 that GCC6 can support C, but in fact it does not support C in my trial.


The attachment is the perf results of memcpy with and without my patch and =
original DPDK codes but without inline.

It's just for comparison, so right now, I only tested on Broadwell, using A=
VX2.

The results are from running test/test/test_memcpy_perf.c.

(C =3D compile-time constant)

/* Do aligned tests where size is a variable */

/* Do aligned tests where size is a compile-time constant */

/* Do unaligned tests where size is a variable */

/* Do unaligned tests where size is a compile-time constant */


4-7 means dpdk costs time 4 and glibc costs time 7

For size smaller than 128 bytes. This patch's perf is bad and even worse th=
an glibc.

When size grows, the perf is better than glibc but worse than original dpdk=
.

And when grows above about 1024 bytes, it performs similarly to original dp=
dk.

Furthermore, if delete inline in original dpdk, the perf are similar to the=
 perf with patch.

Different situations(4 types, such as cache to cache) perform differently b=
ut the trend is the same (size grows, perf grows).


So if needs dynamic, needs sacrifices some perf and needs to compile for th=
e minimum target (e.g. compile for target avx, run on avx, avx2, avx512f).


Thus, I think this feature shouldn't be delivered in this release.


Best Regards,

Xiaoyun Li