From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by dpdk.org (Postfix) with ESMTP id 4F4BE379E for ; Thu, 14 Jan 2016 17:48:25 +0100 (CET) Received: by mail-pa0-f50.google.com with SMTP id yy13so286840344pab.3 for ; Thu, 14 Jan 2016 08:48:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=MqyMbLNfJEae6YVZYvr3PxrASXA5KJW0C/nzt8pwYn4=; b=fZGUvw08zUymMThjJzcr9R2R/CkUnbPbbzl4vVlOXLK03exkBPHxpCdpWqnwpV1j7C p4Q9qZAiWm+sq81uk9fqymj7bnb0cXE9elY5u8tNl4HENh0XUL71aIvWwNyfXU+slZ5q pZShIgaNerw+INhBbIm6kV66KTrnIkCpNVy5P7lZC3aKLyiFTqyzPE+yJfLkKIeqEPzn tcRieckY/msZOQa1lrbykGsa4dD4ER88a0U6Eb6hv2Yz1rC9CiUp/5L245FmIFNhrGcX FaF/lHyLafqN08QzxvmUvlawgLiN3TbzD0sfpWKahAwrDm6v72VYaoQN9/GbTZcROREA WxOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=MqyMbLNfJEae6YVZYvr3PxrASXA5KJW0C/nzt8pwYn4=; b=NrP8wrN0RLyn1e6vP1J42XmjlOEdTyERVxqY2JdG1oostwriUrRJsmd80RhcWmOg1g NYdWBsfz7PLXOhVtO4tgQn1Fb81njLfqWnHRpdEtDfmEJlG7QA1a9psMgSpsCUSYBWGY 02WWbLLx8lzaPwoHCykN3Gq6uGD74ZwlKMzeMNvXxh7PpfyWPEQIbcfuY1H2InZIg/34 JwYEtgqzr+M5uCPhhK34SpjQKr834Jin5AYDYE2axv/ClJm8TJ91h30u2D4WPfEMv2bz aUONjPm7/WUp07yYu8uHyCHnmRp/XTpsDILuRHCF9V/CTKUO3tffz4MLuj27NDQRNR/A ivzQ== X-Gm-Message-State: ALoCoQmKvgx4Z4CU76VASTc5eF7bk5oI7XVEyCu7k/K780fXyxB/OthPPSaLgV1dbUdbGx4/Yb+bfGByfR+FRfHv5qLUF8WgLg== X-Received: by 10.66.235.202 with SMTP id uo10mr7428340pac.63.1452790104563; Thu, 14 Jan 2016 08:48:24 -0800 (PST) Received: from xeon-e3 (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by smtp.gmail.com with ESMTPSA id e1sm10519131pas.1.2016.01.14.08.48.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Jan 2016 08:48:24 -0800 (PST) Date: Thu, 14 Jan 2016 08:48:32 -0800 From: Stephen Hemminger To: Zhihong Wang Message-ID: <20160114084832.672fac86@xeon-e3> In-Reply-To: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jan 2016 16:48:25 -0000 On Thu, 14 Jan 2016 01:13:18 -0500 Zhihong Wang wrote: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. > > In current DPDK, memcpy holds a large proportion of execution time in > libs like Vhost, especially for large packets, and this patch can bring > considerable benefits. > > The implementation is based on the current DPDK memcpy framework, some > background introduction can be found in these threads: > http://dpdk.org/ml/archives/dev/2014-November/008158.html > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > Code changes are: > > 1. Read CPUID to check if AVX512 is supported by CPU > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > 3. Implement AVX512 memcpy and choose the right implementation based on > predefined macros > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > Zhihong Wang (4): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > > app/test/test_memcpy_perf.c | 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 247 ++++++++++++++++++++- > mk/rte.cpuflags.mk | 4 + > 4 files changed, 255 insertions(+), 4 deletions(-) > This really looks like code that could benefit from Gcc function multiversioning. The current cpuflags model is useless/flawed in real product deployment