From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zhiyong.yang@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id 974F03DC
 for <dev@dpdk.org>; Tue, 27 Dec 2016 11:05:38 +0100 (CET)
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga104.jf.intel.com with ESMTP; 27 Dec 2016 02:05:37 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.33,415,1477983600"; d="scan'208";a="43603051"
Received: from unknown (HELO dpdk5.bj.intel.com) ([172.16.182.188])
 by orsmga004.jf.intel.com with ESMTP; 27 Dec 2016 02:05:35 -0800
From: Zhiyong Yang <zhiyong.yang@intel.com>
To: dev@dpdk.org
Cc: yuanhan.liu@linux.intel.com, thomas.monjalon@6wind.com,
 bruce.richardson@intel.com, konstantin.ananyev@intel.com,
 pablo.de.lara.guarch@intel.com
Date: Tue, 27 Dec 2016 18:04:54 +0800
Message-Id: <1482833098-38096-1-git-send-email-zhiyong.yang@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com>
References: <1480926387-63838-2-git-send-email-zhiyong.yang@intel.com>
Subject: [dpdk-dev] [PATCH v2 0/4] eal/common: introduce rte_memset and
	related test
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Dec 2016 10:05:39 -0000

DPDK code has met performance drop badly in some case when calling glibc
function memset. Reference to discussions about memset in 
http://dpdk.org/ml/archives/dev/2016-October/048628.html
It is necessary to introduce more high efficient function to fix it.
One important thing about rte_memset is that we can get clear control
on what instruction flow is used.

This patchset introduces rte_memset to bring more high efficient
implementation, and will bring obvious perf improvement, especially
for small N bytes in the most application scenarios.

Patch 1 implements rte_memset in the file rte_memset.h on IA platform
The file supports three types of instruction sets including sse & avx
(128bits), avx2(256bits) and avx512(512bits). rte_memset makes use of
vectorization and inline function to improve the perf on IA. In addition,
cache line and memory alignment are fully taken into consideration.

Patch 2 implements functional autotest to validates the function whether
to work in a right way.

Patch 3 implements performance autotest separately in cache and memory.
We can see the perf of rte_memset is obviously better than glibc memset
especially for small N bytes.

Patch 4 Using rte_memset instead of copy_virtio_net_hdr can bring 3%~4%
performance improvements on IA platform from virtio/vhost non-mergeable
loopback testing.

Changes in V2:

Patch 1:
Rename rte_memset.h -> rte_memset_64.h and create a file rte_memset.h
for each arch.

Patch 3:
add the perf comparation data between rte_memset and memset on haswell.

Patch 4:
Modify release_17_02.rst description.

Zhiyong Yang (4):
  eal/common: introduce rte_memset on IA platform
  app/test: add functional autotest for rte_memset
  app/test: add performance autotest for rte_memset
  lib/librte_vhost: improve vhost perf using rte_memset

 app/test/Makefile                                  |   3 +
 app/test/test_memset.c                             | 158 +++++++++
 app/test/test_memset_perf.c                        | 348 +++++++++++++++++++
 doc/guides/rel_notes/release_17_02.rst             |   7 +
 .../common/include/arch/arm/rte_memset.h           |  36 ++
 .../common/include/arch/ppc_64/rte_memset.h        |  36 ++
 .../common/include/arch/tile/rte_memset.h          |  36 ++
 .../common/include/arch/x86/rte_memset.h           |  51 +++
 .../common/include/arch/x86/rte_memset_64.h        | 378 +++++++++++++++++++++
 lib/librte_eal/common/include/generic/rte_memset.h |  52 +++
 lib/librte_vhost/virtio_net.c                      |  18 +-
 11 files changed, 1116 insertions(+), 7 deletions(-)
 create mode 100644 app/test/test_memset.c
 create mode 100644 app/test/test_memset_perf.c
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memset.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memset.h
 create mode 100644 lib/librte_eal/common/include/arch/tile/rte_memset.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memset.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memset_64.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_memset.h

-- 
2.7.4