From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id C70D29A8A for ; Wed, 15 Apr 2015 08:32:50 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 14 Apr 2015 23:32:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,580,1422950400"; d="scan'208";a="680325255" Received: from unknown (HELO [10.217.248.56]) ([10.217.248.56]) by orsmga001.jf.intel.com with ESMTP; 14 Apr 2015 23:32:48 -0700 Message-ID: <552E05FB.30504@intel.com> Date: Wed, 15 Apr 2015 08:32:27 +0200 From: Pawel Wodkowski User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: dev@dpdk.org References: <1429047011-11545-1-git-send-email-rkerur@gmail.com> <1429047113-11688-1-git-send-email-rkerur@gmail.com> In-Reply-To: <1429047113-11688-1-git-send-email-rkerur@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] Clean up rte_memcpy.h file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Apr 2015 06:32:51 -0000 On 2015-04-14 23:31, Ravi Kerur wrote: > + > + for (i = 0; i < 8; i++) { > + ymm = _mm256_loadu_si256((const __m256i *)(src + i * 32)); > + _mm256_storeu_si256((__m256i *)(dst + i * 32), ymm); > + } > + > n -= 256; > - ymm1 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 1 * 32)); > - ymm2 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 2 * 32)); > - ymm3 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 3 * 32)); > - ymm4 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 4 * 32)); > - ymm5 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 5 * 32)); > - ymm6 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 6 * 32)); > - ymm7 = _mm256_loadu_si256((const __m256i *)((const uint8_t *)src + 7 * 32)); > - src = (const uint8_t *)src + 256; > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), ymm0); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), ymm1); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32), ymm2); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32), ymm3); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 4 * 32), ymm4); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 5 * 32), ymm5); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 6 * 32), ymm6); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 7 * 32), ymm7); > - dst = (uint8_t *)dst + 256; > + src = src + 256; > + dst = dst + 256; > } Did you perform a performance test on that part? -- Pawel