From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com [209.85.218.46]) by dpdk.org (Postfix) with ESMTP id 554C08DB4 for ; Fri, 15 Jan 2016 23:03:30 +0100 (CET) Received: by mail-oi0-f46.google.com with SMTP id w75so112677908oie.0 for ; Fri, 15 Jan 2016 14:03:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/nWbhqhkHF4ZacLS7GyiYSom4BfzdRLlXxDzb5zI6W8=; b=zmN0kQBLZ1d7pDKMB30NugEftotN4is4YVosmvAYo+mdTwYpyNY236hoSXk4lC5279 YalBu9NrD2CpkUC6Sd3apjB6O04b13zz3XvGCQzWdkHh5On3fjw/c5bbojIygr4Jh88E aWeMb+V4QcU4YIdL701HS80aLy9MYiWTuULkpp5IDkNliM9UE0yilUU1amN5KzntvqC3 zEGnzdzPZ8nXeMuDm3dPBLDUJ7/3nASJhXShuRACWdJLvySgXWcy35UTM3gDvKB/kSfX RPcGJt2ItptjZyZ/BhS/Aqa18gzvVs4AdnjZ6gYSIG0l2OYpiJDR+p27wsxDC1xTOppt mqGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=/nWbhqhkHF4ZacLS7GyiYSom4BfzdRLlXxDzb5zI6W8=; b=eiamG7wH0rPEjM1n5kGxvAtxF67VPWY50m8iy8Fx8DkZOpRaBm0F0Ea56GZJ9yC4z+ QyX/NIksuL7WnNSGe8qHBgTRpQBzhLDcxQ2u6t2jjbL3tEmPgtr2P1TH4Uzd9tCOOywL fG7hcDLYRrClPbErneHI30atWA8ZDiHhmLbZwdvbd7pK1aCve5plZij3eGIaANhzbmbZ hriV3VLbnpWj30H+VxWGNw0Yeg2WkaL2cy+x6S2NMn+XDOiWG/TtwLlTgrF5s0zTspLF NSudx22wk/WvLY3SOSJgOyReWXyxKdFJBDLynqBC/2yWO2DR6U6ys5J7Sz8ENqXPJ1Q3 BBWA== X-Gm-Message-State: ALoCoQnekdjTiyknmZfQ1oI75wIDoWD6ISAN4x/37aOCsqPRfgbCoKvy+oLYuO4CDPXybsg37XwSVp6ga3AcyWgmG/eZMvqPJuy4Hj4RkRvAZSP7BTMCaGo= MIME-Version: 1.0 X-Received: by 10.202.213.215 with SMTP id m206mr10192063oig.26.1452895409708; Fri, 15 Jan 2016 14:03:29 -0800 (PST) Received: by 10.60.38.132 with HTTP; Fri, 15 Jan 2016 14:03:28 -0800 (PST) Received: by 10.60.38.132 with HTTP; Fri, 15 Jan 2016 14:03:28 -0800 (PST) In-Reply-To: <8F6C2BD409508844A0EFC19955BE0941033A63DA@SHSMSX103.ccr.corp.intel.com> References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> <20160114084832.672fac86@xeon-e3> <8F6C2BD409508844A0EFC19955BE0941033A63DA@SHSMSX103.ccr.corp.intel.com> Date: Fri, 15 Jan 2016 23:03:28 +0100 Message-ID: From: Vincent JARDIN To: "Wang, Zhihong" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2016 22:03:30 -0000 Le 14 janv. 2016 22:39, "Wang, Zhihong" a =C3=A9cr= it : > > > > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Friday, January 15, 2016 12:49 AM > > To: Wang, Zhihong > > Cc: dev@dpdk.org; Ananyev, Konstantin ; > > Richardson, Bruce ; Xie, Huawei > > > > Subject: Re: [PATCH 0/4] Optimize memcpy for AVX512 platforms > > > > On Thu, 14 Jan 2016 01:13:18 -0500 > > Zhihong Wang wrote: > > > > > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > > > utilization of hardware resources and deliver high performance. > > > > > > In current DPDK, memcpy holds a large proportion of execution time in > > > libs like Vhost, especially for large packets, and this patch can bring > > > considerable benefits. > > > > > > The implementation is based on the current DPDK memcpy framework, som= e > > > background introduction can be found in these threads: > > > http://dpdk.org/ml/archives/dev/2014-November/008158.html > > > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > > > > > Code changes are: > > > > > > 1. Read CPUID to check if AVX512 is supported by CPU > > > > > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > > > > > 3. Implement AVX512 memcpy and choose the right implementation base= d > > on > > > predefined macros > > > > > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > > > > > Zhihong Wang (4): > > > lib/librte_eal: Identify AVX512 CPU flag > > > mk: Predefine AVX512 macro for compiler > > > lib/librte_eal: Optimize memcpy for AVX512 platforms > > > app/test: Adjust alignment unit for memcpy perf test > > > > > > app/test/test_memcpy_perf.c | 6 + > > > .../common/include/arch/x86/rte_cpuflags.h | 2 + > > > .../common/include/arch/x86/rte_memcpy.h | 247 > > ++++++++++++++++++++- > > > mk/rte.cpuflags.mk | 4 + > > > 4 files changed, 255 insertions(+), 4 deletions(-) > > > > > > > This really looks like code that could benefit from Gcc > > function multiversioning. The current cpuflags model is useless/flawed > > in real product deployment > > > I've tried gcc function multi versioning, with a simple add() function > which returns a + b, and a loop calling it for millions of times. Turned > out this mechanism adds 17% extra time to execute, overall it's a lot > of extra overhead. > > Quote the gcc wiki: "GCC takes care of doing the dispatching to call > the right version at runtime". So it loses inlining and adds extra > dispatching overhead. > > Also this mechanism works only for C++, right? > > I think using predefined macros at compile time is more efficient and > suits DPDK more. > I agree with you: performance first. So having a mix of runtime and compile time would work. For those who are ok with some performance drops, they can go with runtime. > Could you please give an example when the current CPU flags model > stop working? So I can fix it. >