From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 21798B5EC for ; Mon, 16 Feb 2015 16:57:51 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 16 Feb 2015 07:57:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,588,1418112000"; d="scan'208";a="455317566" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by FMSMGA003.fm.intel.com with ESMTP; 16 Feb 2015 07:42:52 -0800 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.218]) by IRSMSX152.ger.corp.intel.com ([169.254.6.205]) with mapi id 14.03.0195.001; Mon, 16 Feb 2015 15:57:48 +0000 From: "De Lara Guarch, Pablo" To: "Wang, Zhihong" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization Thread-Index: AQHQO2zBEqs3GUZeYUSZIhEuIgGfrpzzi7LQ Date: Mon, 16 Feb 2015 15:57:48 +0000 Message-ID: References: <1422499127-11689-1-git-send-email-zhihong.wang@intel.com> In-Reply-To: <1422499127-11689-1-git-send-email-zhihong.wang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Feb 2015 15:57:52 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhihong Wang > Sent: Thursday, January 29, 2015 2:39 AM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization >=20 > This patch set optimizes memcpy for DPDK for both SSE and AVX platforms. > It also extends memcpy test coverage with unaligned cases and more test > points. >=20 > Optimization techniques are summarized below: >=20 > 1. Utilize full cache bandwidth >=20 > 2. Enforce aligned stores >=20 > 3. Apply load address alignment based on architecture features >=20 > 4. Make load/store address available as early as possible >=20 > 5. General optimization techniques like inlining, branch reducing, prefet= ch > pattern access >=20 > -------------- > Changes in v2: >=20 > 1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast > build >=20 > 2. Modified macro definition for better code readability & safety >=20 > Zhihong Wang (4): > app/test: Disabled VTA for memcpy test in app/test/Makefile > app/test: Removed unnecessary test cases in app/test/test_memcpy.c > app/test: Extended test coverage in app/test/test_memcpy_perf.c > lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE > and AVX platforms >=20 > app/test/Makefile | 6 + > app/test/test_memcpy.c | 52 +- > app/test/test_memcpy_perf.c | 220 ++++--- > .../common/include/arch/x86/rte_memcpy.h | 680 > +++++++++++++++------ > 4 files changed, 654 insertions(+), 304 deletions(-) >=20 > -- > 1.9.3 Acked-by: Pablo de Lara