From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 898C85A7F for ; Mon, 26 Jan 2015 15:43:09 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP; 26 Jan 2015 06:38:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,469,1418112000"; d="scan'208";a="517688334" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by orsmga003.jf.intel.com with ESMTP; 26 Jan 2015 06:36:05 -0800 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.28]) by IRSMSX152.ger.corp.intel.com ([169.254.6.43]) with mapi id 14.03.0195.001; Mon, 26 Jan 2015 14:43:05 +0000 From: "Wodkowski, PawelX" To: "Wang, Zhihong" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms Thread-Index: AQHQM4rlFn92BsIfrkuRPcGYb//KGJzSZLrg Date: Mon, 26 Jan 2015 14:43:04 +0000 Message-ID: References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> In-Reply-To: <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> Accept-Language: pl-PL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2015 14:43:10 -0000 Hi, I must say: greate work. I have some small comments: > +/** > + * Macro for copying unaligned block from one location to another, > + * 47 bytes leftover maximum, > + * locations should not overlap. > + * Requirements: > + * - Store is aligned > + * - Load offset is , which must be immediate value within [1, 1= 5] > + * - For , make sure bit backwards & <16 - offset> bit for= wards > are available for loading > + * - , , must be variables > + * - __m128i ~ must be pre-defined > + */ > +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset) > \ > +{ = \ ... > +} Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable si= de effects. Second: Why you completely substitute #define rte_memcpy(dst, src, n) \ ({ (__builtin_constant_p(n)) ? \ memcpy((dst), (src), (n)) : \ rte_memcpy_func((dst), (src), (n)); }) with inline rte_memcpy()? This construction can help compiler to deduce which version to use (static?) inline implementation or call external function. Did you try 'extern inline' type? It could help reducing compilation time.