From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by dpdk.org (Postfix) with ESMTP id E727C5A94 for ; Tue, 20 Jan 2015 18:15:55 +0100 (CET) Received: by mail-pa0-f42.google.com with SMTP id et14so46971916pad.1 for ; Tue, 20 Jan 2015 09:15:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=nKoWAT20IjMh+9sc14REZOLSh1oOZmLcEdWXzWPL20w=; b=ibbRib92D4FyrFPgx73KLMwgj9QA4tGejXyZ4k4+q18brZeLqPjr3ry/Eq6wr6UfXn LuFqt5uXVvE5veC4+GmFb8VFdF/2Fbpep2g+oOr7eUddu71ujPkefDPNlPw8WakC0rc3 bc4GrP+Bn9MHlsqk48Ht/t9/7Z4vog8B5UmQpgLg5qNMCz3lgCnkOkh6zxl34p1eqxv9 TccSPMxGHH0JqWzja+I0FTpjq//tC79OKIjIt2B2rrBeKh7bVVWlOeiSYS1Sm/urwawb msriJ+bydUN1M/hPkZ63MDjDLasYceh5yJX5NCetrED+n7c1c1AlEKMTqQDK+SEost8R di5Q== X-Gm-Message-State: ALoCoQnDnGFXm6e/K00fODWcs0D0Rz8fwWDCYRA5flf3MmcWgXpUb75OjoMd1Ya7mi/2AvmHBmJu X-Received: by 10.66.161.103 with SMTP id xr7mr55274701pab.141.1421774155277; Tue, 20 Jan 2015 09:15:55 -0800 (PST) Received: from urahara (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by mx.google.com with ESMTPSA id n2sm653688pdo.0.2015.01.20.09.15.53 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Jan 2015 09:15:54 -0800 (PST) Date: Tue, 20 Jan 2015 09:15:38 -0800 From: Stephen Hemminger To: zhihong.wang@intel.com Message-ID: <20150120091538.4c3a1363@urahara> In-Reply-To: <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 17:15:56 -0000 On Mon, 19 Jan 2015 09:53:34 +0800 zhihong.wang@intel.com wrote: > Main code changes: > > 1. Differentiate architectural features based on CPU flags > > a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth > > b. Implement separated copy flow specifically optimized for target architecture > > 2. Rewrite the memcpy function "rte_memcpy" > > a. Add store aligning > > b. Add load aligning based on architectural features > > c. Put block copy loop into inline move functions for better control of instruction order > > d. Eliminate unnecessary MOVs > > 3. Rewrite the inline move functions > > a. Add move functions for unaligned load cases > > b. Change instruction order in copy loops for better pipeline utilization > > c. Use intrinsics instead of assembly code > > 4. Remove slow glibc call for constant copies > > Signed-off-by: Zhihong Wang Dumb question: why not fix glibc memcpy instead? What is special about rte_memcpy?