From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 3618C5AE7 for ; Tue, 27 Jan 2015 06:12:28 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 26 Jan 2015 21:12:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="445586394" Received: from pgsmsx104.gar.corp.intel.com ([10.221.44.91]) by FMSMGA003.fm.intel.com with ESMTP; 26 Jan 2015 20:58:26 -0800 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by PGSMSX104.gar.corp.intel.com (10.221.44.91) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 27 Jan 2015 13:12:05 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.64]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.231]) with mapi id 14.03.0195.001; Tue, 27 Jan 2015 13:12:03 +0800 From: "Wang, Zhihong" To: "Wodkowski, PawelX" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms Thread-Index: AQHQM4rlFn92BsIfrkuRPcGYb//KGJzSZLrggAEOXRA= Date: Tue, 27 Jan 2015 05:12:03 +0000 Message-ID: References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jan 2015 05:12:30 -0000 > -----Original Message----- > From: Wodkowski, PawelX > Sent: Monday, January 26, 2015 10:43 PM > To: Wang, Zhihong; dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in > arch/x86/rte_memcpy.h for both SSE and AVX platforms >=20 > Hi, >=20 > I must say: greate work. >=20 > I have some small comments: >=20 > > +/** > > + * Macro for copying unaligned block from one location to another, > > + * 47 bytes leftover maximum, > > + * locations should not overlap. > > + * Requirements: > > + * - Store is aligned > > + * - Load offset is , which must be immediate value within [1,= 15] > > + * - For , make sure bit backwards & <16 - offset> bit > forwards > > are available for loading > > + * - , , must be variables > > + * - __m128i ~ must be pre-defined > > + */ > > +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset) > > \ > > +{ = \ > ... > > +} >=20 > Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable = side > effects. >=20 > Second: > Why you completely substitute > #define rte_memcpy(dst, src, n) \ > ({ (__builtin_constant_p(n)) ? \ > memcpy((dst), (src), (n)) : \ > rte_memcpy_func((dst), (src), (n)); }) >=20 > with inline rte_memcpy()? This construction can help compiler to deduce > which version to use (static?) inline implementation or call external > function. >=20 > Did you try 'extern inline' type? It could help reducing compilation time= . Hi Pawel, Good call on "MOVEUNALIGNED_LEFT47". Thanks! I removed the conditional __builtin_constant_p(n) because it calls glibc me= mcpy when the parameter is constant, while rte_memcpy has better performanc= e there. Current long compile time is caused by too many function calls, I'll fix th= at in the next version. Zhihong (John)