From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pawelx.wodkowski@intel.com>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id 898C85A7F
 for <dev@dpdk.org>; Mon, 26 Jan 2015 15:43:09 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga103.jf.intel.com with ESMTP; 26 Jan 2015 06:38:53 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,469,1418112000"; d="scan'208";a="517688334"
Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66])
 by orsmga003.jf.intel.com with ESMTP; 26 Jan 2015 06:36:05 -0800
Received: from irsmsx102.ger.corp.intel.com ([169.254.2.28]) by
 IRSMSX152.ger.corp.intel.com ([169.254.6.43]) with mapi id 14.03.0195.001;
 Mon, 26 Jan 2015 14:43:05 +0000
From: "Wodkowski, PawelX" <pawelx.wodkowski@intel.com>
To: "Wang, Zhihong" <zhihong.wang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in
 arch/x86/rte_memcpy.h for both SSE and AVX platforms
Thread-Index: AQHQM4rlFn92BsIfrkuRPcGYb//KGJzSZLrg
Date: Mon, 26 Jan 2015 14:43:04 +0000
Message-ID: <F6F2A6264E145F47A18AB6DF8E87425D12B8C8E2@IRSMSX102.ger.corp.intel.com>
References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com>
 <1421632414-10027-5-git-send-email-zhihong.wang@intel.com>
In-Reply-To: <1421632414-10027-5-git-send-email-zhihong.wang@intel.com>
Accept-Language: pl-PL, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy
	in	arch/x86/rte_memcpy.h for both SSE and AVX platforms
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jan 2015 14:43:10 -0000

Hi,

I must say: greate work.

I have some small comments:

> +/**
> + * Macro for copying unaligned block from one location to another,
> + * 47 bytes leftover maximum,
> + * locations should not overlap.
> + * Requirements:
> + * - Store is aligned
> + * - Load offset is <offset>, which must be immediate value within [1, 1=
5]
> + * - For <src>, make sure <offset> bit backwards & <16 - offset> bit for=
wards
> are available for loading
> + * - <dst>, <src>, <len> must be variables
> + * - __m128i <xmm0> ~ <xmm8> must be pre-defined
> + */
> +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> \
> +{                                                                       =
                                    \
...
> +}

Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable si=
de
effects.

Second:
Why you completely substitute
#define rte_memcpy(dst, src, n)              \
	({ (__builtin_constant_p(n)) ?       \
	memcpy((dst), (src), (n)) :          \
	rte_memcpy_func((dst), (src), (n)); })

with inline rte_memcpy()? This construction  can help compiler to deduce
which version to use (static?) inline implementation or call external
function.

Did you try 'extern inline' type? It could help reducing compilation time.