From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id A1EA01F5 for ; Thu, 29 Jan 2015 04:42:30 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 28 Jan 2015 19:39:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,484,1418112000"; d="scan'208";a="677778791" Received: from kmsmsx151.gar.corp.intel.com ([172.21.73.86]) by orsmga002.jf.intel.com with ESMTP; 28 Jan 2015 19:42:04 -0800 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by KMSMSX151.gar.corp.intel.com (172.21.73.86) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 29 Jan 2015 11:42:02 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.253]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.231]) with mapi id 14.03.0195.001; Thu, 29 Jan 2015 11:42:00 +0800 From: "Fu, JingguoX" To: "Wang, Zhihong" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization Thread-Index: AQHQM4rUP6yWryTgw0iHFdSpSX0nGZzVVc2A Date: Thu, 29 Jan 2015 03:42:00 +0000 Message-ID: <6BD6202160B55B409D423293115822625C166D@SHSMSX101.ccr.corp.intel.com> References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> In-Reply-To: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jan 2015 03:42:31 -0000 Basic Information Patch name DPDK memcpy optimization Brief description about test purpose Verify memory copy and memo= ry copy performance cases on variety OS Test Flag Tested-by Tester name jingguox.fu at intel.com Test Tool Chain information N/A Commit ID 88fa98a60b34812bfed92e5b2706fcf7e1cbcbc8 Test Result Summary Total 6 cases, 6 passed, 0 failed =20 Test environment - Environment 1: OS: Ubuntu12.04 3.2.0-23-generic X86_64 GCC: gcc version 4.6.3 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] = (rev 01) - Environment 2:=20 OS: Ubuntu14.04 3.13.0-24-generic GCC: gcc version 4.8.2 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] = (rev 01) Environment 3: OS: Fedora18 3.6.10-4.fc18.x86_64 GCC: gcc version 4.7.2 20121109 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] = (rev 01) Detailed Testing information =20 Test Case - name test_memcpy Test Case - Description=20 Create two buffers, and initialise one with random values= . These are copied=20 to the second buffer and then compared to see if the copy= was successful. The=20 bytes outside the copied area are also checked to make su= re they were not changed. Test Case -test sample/application test application in app/test Test Case -command / instruction # ./app/test/test -n 1 -c ffff #RTE>> memcpy_autotest Test Case - expected #RTE>> Test OK Test Result- PASSED Test Case - name test_memcpy_perf Test Case - Description a number of different sizes and cached/uncached permutati= ons Test Case -test sample/application test application in app/test Test Case -command / instruction # ./app/test/test -n 1 -c ffff #RTE>> memcpy_perf_autotest Test Case - expected #RTE>> Test OK Test Result- PASSED -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of zhihong.wang@intel.com Sent: Monday, January 19, 2015 09:54 To: dev@dpdk.org Subject: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization This patch set optimizes memcpy for DPDK for both SSE and AVX platforms. It also extends memcpy test coverage with unaligned cases and more test poi= nts. Optimization techniques are summarized below: 1. Utilize full cache bandwidth 2. Enforce aligned stores 3. Apply load address alignment based on architecture features 4. Make load/store address available as early as possible 5. General optimization techniques like inlining, branch reducing, prefetch= pattern access Zhihong Wang (4): Disabled VTA for memcpy test in app/test/Makefile Removed unnecessary test cases in test_memcpy.c Extended test coverage in test_memcpy_perf.c Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms app/test/Makefile | 6 + app/test/test_memcpy.c | 52 +- app/test/test_memcpy_perf.c | 238 +++++--- .../common/include/arch/x86/rte_memcpy.h | 664 +++++++++++++++--= ---- 4 files changed, 656 insertions(+), 304 deletions(-) --=20 1.9.3