From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zhihong.wang@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 609505A62
 for <dev@dpdk.org>; Mon, 18 Jan 2016 11:08:38 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga101.jf.intel.com with ESMTP; 18 Jan 2016 02:08:37 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.22,312,1449561600"; d="scan'208";a="729395978"
Received: from unknown (HELO dpdk5.sh.intel.com) ([10.239.129.244])
 by orsmga003.jf.intel.com with ESMTP; 18 Jan 2016 02:08:36 -0800
From: Zhihong Wang <zhihong.wang@intel.com>
To: dev@dpdk.org
Date: Sun, 17 Jan 2016 22:05:09 -0500
Message-Id: <1453086314-30158-1-git-send-email-zhihong.wang@intel.com>
X-Mailer: git-send-email 2.5.0
In-Reply-To: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com>
References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com>
Subject: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jan 2016 10:08:38 -0000

This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
utilization of hardware resources and deliver high performance.

In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits.

The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.html
http://dpdk.org/ml/archives/dev/2015-January/011800.html

Code changes are:

  1. Read CPUID to check if AVX512 is supported by CPU

  2. Predefine AVX512 macro if AVX512 is enabled by compiler

  3. Implement AVX512 memcpy and choose the right implementation based on
     predefined macros

  4. Decide alignment unit for memcpy perf test based on predefined macros

--------------
Changes in v2:

  1. Tune performance for prior platforms

Zhihong Wang (5):
  lib/librte_eal: Identify AVX512 CPU flag
  mk: Predefine AVX512 macro for compiler
  lib/librte_eal: Optimize memcpy for AVX512 platforms
  app/test: Adjust alignment unit for memcpy perf test
  lib/librte_eal: Tune memcpy for prior platforms

 app/test/test_memcpy_perf.c                        |   6 +
 .../common/include/arch/x86/rte_cpuflags.h         |   2 +
 .../common/include/arch/x86/rte_memcpy.h           | 269 ++++++++++++++++++++-
 mk/rte.cpuflags.mk                                 |   4 +
 4 files changed, 268 insertions(+), 13 deletions(-)

-- 
2.5.0