From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f178.google.com (mail-pd0-f178.google.com [209.85.192.178]) by dpdk.org (Postfix) with ESMTP id 1091CC35C for ; Wed, 15 Apr 2015 23:00:52 +0200 (CEST) Received: by pdbqd1 with SMTP id qd1so65573548pdb.2 for ; Wed, 15 Apr 2015 14:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=83dxJnsONXOP7rDmSf0/oe65FMVdS2qjBQLtlLVircA=; b=Tj91WDJD3ovIQ+F0NE4zvCZ81Utg+GE5mm5DOujvNvn0we/q4mX7ADFnwlUnJQOg3P EQC9S613Gw+DqcIJEtImF40HcxFRADJnaZI9IG9a8WS1zxlj6hfnmC4NWC/EpOKAysTc 5vym7xnQVdw+HBzAFpmXEyKuBQJ15dkM9tjHsUzGs48CK2lEOyXM61BNbWkCId+zFUdA mDzqUwbWZBIsgCjM+xXvryPN9JkYLd7i5Ktc3DSI9m7Ec6TQ0W96AGDQsZKYqJkyZKMp ETXhWz9U0gdG4fUZIKkHCGnWpW0avrkXrQHz3mP5I0u/fYSe1vDxMqx0tczGc2fWXM4q 8F/A== MIME-Version: 1.0 X-Received: by 10.70.136.106 with SMTP id pz10mr49689238pdb.105.1429131651406; Wed, 15 Apr 2015 14:00:51 -0700 (PDT) Received: by 10.70.123.2 with HTTP; Wed, 15 Apr 2015 14:00:51 -0700 (PDT) In-Reply-To: <552E05FB.30504@intel.com> References: <1429047011-11545-1-git-send-email-rkerur@gmail.com> <1429047113-11688-1-git-send-email-rkerur@gmail.com> <552E05FB.30504@intel.com> Date: Wed, 15 Apr 2015 14:00:51 -0700 Message-ID: From: Ravi Kerur To: Pawel Wodkowski Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] Clean up rte_memcpy.h file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Apr 2015 21:00:52 -0000 On Tue, Apr 14, 2015 at 11:32 PM, Pawel Wodkowski < pawelx.wodkowski@intel.com> wrote: > On 2015-04-14 23:31, Ravi Kerur wrote: > >> + >> + for (i = 0; i < 8; i++) { >> + ymm = _mm256_loadu_si256((const __m256i *)(src + >> i * 32)); >> + _mm256_storeu_si256((__m256i *)(dst + i * 32), >> ymm); >> + } >> + >> n -= 256; >> - ymm1 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 1 * 32)); >> - ymm2 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 2 * 32)); >> - ymm3 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 3 * 32)); >> - ymm4 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 4 * 32)); >> - ymm5 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 5 * 32)); >> - ymm6 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 6 * 32)); >> - ymm7 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 7 * 32)); >> - src = (const uint8_t *)src + 256; >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), >> ymm0); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), >> ymm1); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32), >> ymm2); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32), >> ymm3); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 4 * 32), >> ymm4); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 5 * 32), >> ymm5); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 6 * 32), >> ymm6); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 7 * 32), >> ymm7); >> - dst = (uint8_t *)dst + 256; >> + src = src + 256; >> + dst = dst + 256; >> } >> > > Did you perform a performance test on that part? > > I ran "make test" which runs "memcpy perf" results were given in "cover-letter". I am pasting it here again. /**********************With changes*************************************/ Start memcpy_perf: Success [00m 00s] Memcpy performance autotest: Success [09m 36s] [17m 45s] /**********************Without changes**********************************/ Start memcpy_perf: Success [00m 00s] Memcpy performance autotest: Success [09m 35s] [13m 57s] -- > Pawel >