From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id A15E75A15 for ; Tue, 20 Jan 2015 20:16:31 +0100 (CET) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1YDeHp-0006RV-Ha; Tue, 20 Jan 2015 14:16:27 -0500 Date: Tue, 20 Jan 2015 14:16:24 -0500 From: Neil Horman To: Stephen Hemminger Message-ID: <20150120191624.GJ18449@hmsreliant.think-freely.org> References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> <20150120091538.4c3a1363@urahara> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150120091538.4c3a1363@urahara> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 19:16:31 -0000 On Tue, Jan 20, 2015 at 09:15:38AM -0800, Stephen Hemminger wrote: > On Mon, 19 Jan 2015 09:53:34 +0800 > zhihong.wang@intel.com wrote: > > > Main code changes: > > > > 1. Differentiate architectural features based on CPU flags > > > > a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth > > > > b. Implement separated copy flow specifically optimized for target architecture > > > > 2. Rewrite the memcpy function "rte_memcpy" > > > > a. Add store aligning > > > > b. Add load aligning based on architectural features > > > > c. Put block copy loop into inline move functions for better control of instruction order > > > > d. Eliminate unnecessary MOVs > > > > 3. Rewrite the inline move functions > > > > a. Add move functions for unaligned load cases > > > > b. Change instruction order in copy loops for better pipeline utilization > > > > c. Use intrinsics instead of assembly code > > > > 4. Remove slow glibc call for constant copies > > > > Signed-off-by: Zhihong Wang > > Dumb question: why not fix glibc memcpy instead? > What is special about rte_memcpy? > > Fair point. Though, does glibc implement optimized memcpys per arch? Or do they just rely on the __builtin's from gcc to get optimized variants? Neil