From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id BEA4CC3E6 for ; Thu, 28 Jan 2016 12:22:01 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP; 28 Jan 2016 03:22:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,358,1449561600"; d="scan'208";a="900051880" Received: from unknown (HELO dpdk5.sh.intel.com) ([10.239.129.244]) by orsmga002.jf.intel.com with ESMTP; 28 Jan 2016 03:21:59 -0800 From: Zhihong Wang To: rkerur@gmail.com Date: Wed, 27 Jan 2016 23:18:35 -0500 Message-Id: <1453954715-31723-1-git-send-email-zhihong.wang@intel.com> X-Mailer: git-send-email 2.5.0 In-Reply-To: <1429562009-11817-1-git-send-email-rkerur@gmail.com> References: <1429562009-11817-1-git-send-email-rkerur@gmail.com> Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [dpdk-dev,v2] Clean up rte_memcpy.h file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jan 2016 11:22:02 -0000 > Remove unnecessary type casting in functions. > > Tested on Ubuntu (14.04 x86_64) with "make test". > "make test" results match the results with baseline. > "Memcpy perf" results match the results with baseline. > > Signed-off-by: Ravi Kerur > Acked-by: Stephen Hemminger > > --- > .../common/include/arch/x86/rte_memcpy.h | 340 +++++++++++---------- > 1 file changed, 175 insertions(+), 165 deletions(-) > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > index 6a57426..839d4ec 100644 > --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h [...] > /** > @@ -150,13 +150,16 @@ rte_mov64blocks(uint8_t *dst, const uint8_t *src, size_t n) > __m256i ymm0, ymm1; > > while (n >= 64) { > - ymm0 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 0 * 32)); > + > + ymm0 = _mm256_loadu_si256((const __m256i *)(src + 0 * 32)); > + ymm1 = _mm256_loadu_si256((const __m256i *)(src + 1 * 32)); > + > + _mm256_storeu_si256((__m256i *)(dst + 0 * 32), ymm0); > + _mm256_storeu_si256((__m256i *)(dst + 1 * 32), ymm1); > + Any particular reason to change the order of the statements here? :) Overall this patch looks good. > n -= 64; > - ymm1 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 1 * 32)); > - src = (const uint8_t *)src + 64; > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), ymm0); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), ymm1); > - dst = (uint8_t *)dst + 64; > + src = src + 64; > + dst = dst + 64; > } > } >