From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 8F6B71F5 for ; Thu, 29 Jan 2015 03:38:55 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP; 28 Jan 2015 18:38:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,484,1418112000"; d="scan'208";a="519405745" Received: from shvmail01.sh.intel.com ([10.239.29.42]) by orsmga003.jf.intel.com with ESMTP; 28 Jan 2015 18:31:39 -0800 Received: from shecgisg004.sh.intel.com (shecgisg004.sh.intel.com [10.239.29.89]) by shvmail01.sh.intel.com with ESMTP id t0T2co0J004675 for ; Thu, 29 Jan 2015 10:38:50 +0800 Received: from shecgisg004.sh.intel.com (localhost [127.0.0.1]) by shecgisg004.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP id t0T2cl1Q011724 for ; Thu, 29 Jan 2015 10:38:50 +0800 Received: (from zwang84@localhost) by shecgisg004.sh.intel.com (8.13.6/8.13.6/Submit) id t0T2clxs011720 for dev@dpdk.org; Thu, 29 Jan 2015 10:38:47 +0800 From: Zhihong Wang To: dev@dpdk.org Date: Thu, 29 Jan 2015 10:38:43 +0800 Message-Id: <1422499127-11689-1-git-send-email-zhihong.wang@intel.com> X-Mailer: git-send-email 1.7.4.1 Subject: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jan 2015 02:38:56 -0000 This patch set optimizes memcpy for DPDK for both SSE and AVX platforms. It also extends memcpy test coverage with unaligned cases and more test points. Optimization techniques are summarized below: 1. Utilize full cache bandwidth 2. Enforce aligned stores 3. Apply load address alignment based on architecture features 4. Make load/store address available as early as possible 5. General optimization techniques like inlining, branch reducing, prefetch pattern access -------------- Changes in v2: 1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast build 2. Modified macro definition for better code readability & safety Zhihong Wang (4): app/test: Disabled VTA for memcpy test in app/test/Makefile app/test: Removed unnecessary test cases in app/test/test_memcpy.c app/test: Extended test coverage in app/test/test_memcpy_perf.c lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms app/test/Makefile | 6 + app/test/test_memcpy.c | 52 +- app/test/test_memcpy_perf.c | 220 ++++--- .../common/include/arch/x86/rte_memcpy.h | 680 +++++++++++++++------ 4 files changed, 654 insertions(+), 304 deletions(-) -- 1.9.3