From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vincent.jardin@6wind.com>
Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com
 [209.85.218.46]) by dpdk.org (Postfix) with ESMTP id 554C08DB4
 for <dev@dpdk.org>; Fri, 15 Jan 2016 23:03:30 +0100 (CET)
Received: by mail-oi0-f46.google.com with SMTP id w75so112677908oie.0
 for <dev@dpdk.org>; Fri, 15 Jan 2016 14:03:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=/nWbhqhkHF4ZacLS7GyiYSom4BfzdRLlXxDzb5zI6W8=;
 b=zmN0kQBLZ1d7pDKMB30NugEftotN4is4YVosmvAYo+mdTwYpyNY236hoSXk4lC5279
 YalBu9NrD2CpkUC6Sd3apjB6O04b13zz3XvGCQzWdkHh5On3fjw/c5bbojIygr4Jh88E
 aWeMb+V4QcU4YIdL701HS80aLy9MYiWTuULkpp5IDkNliM9UE0yilUU1amN5KzntvqC3
 zEGnzdzPZ8nXeMuDm3dPBLDUJ7/3nASJhXShuRACWdJLvySgXWcy35UTM3gDvKB/kSfX
 RPcGJt2ItptjZyZ/BhS/Aqa18gzvVs4AdnjZ6gYSIG0l2OYpiJDR+p27wsxDC1xTOppt
 mqGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=/nWbhqhkHF4ZacLS7GyiYSom4BfzdRLlXxDzb5zI6W8=;
 b=eiamG7wH0rPEjM1n5kGxvAtxF67VPWY50m8iy8Fx8DkZOpRaBm0F0Ea56GZJ9yC4z+
 QyX/NIksuL7WnNSGe8qHBgTRpQBzhLDcxQ2u6t2jjbL3tEmPgtr2P1TH4Uzd9tCOOywL
 fG7hcDLYRrClPbErneHI30atWA8ZDiHhmLbZwdvbd7pK1aCve5plZij3eGIaANhzbmbZ
 hriV3VLbnpWj30H+VxWGNw0Yeg2WkaL2cy+x6S2NMn+XDOiWG/TtwLlTgrF5s0zTspLF
 NSudx22wk/WvLY3SOSJgOyReWXyxKdFJBDLynqBC/2yWO2DR6U6ys5J7Sz8ENqXPJ1Q3
 BBWA==
X-Gm-Message-State: ALoCoQnekdjTiyknmZfQ1oI75wIDoWD6ISAN4x/37aOCsqPRfgbCoKvy+oLYuO4CDPXybsg37XwSVp6ga3AcyWgmG/eZMvqPJuy4Hj4RkRvAZSP7BTMCaGo=
MIME-Version: 1.0
X-Received: by 10.202.213.215 with SMTP id m206mr10192063oig.26.1452895409708; 
 Fri, 15 Jan 2016 14:03:29 -0800 (PST)
Received: by 10.60.38.132 with HTTP; Fri, 15 Jan 2016 14:03:28 -0800 (PST)
Received: by 10.60.38.132 with HTTP; Fri, 15 Jan 2016 14:03:28 -0800 (PST)
In-Reply-To: <8F6C2BD409508844A0EFC19955BE0941033A63DA@SHSMSX103.ccr.corp.intel.com>
References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com>
 <20160114084832.672fac86@xeon-e3>
 <8F6C2BD409508844A0EFC19955BE0941033A63DA@SHSMSX103.ccr.corp.intel.com>
Date: Fri, 15 Jan 2016 23:03:28 +0100
Message-ID: <CAG8AbRWDxvccOH0qYkMsd_WCEihNcn6z_9j8Y2p3Dg96=NSbfw@mail.gmail.com>
From: Vincent JARDIN <vincent.jardin@6wind.com>
To: "Wang, Zhihong" <zhihong.wang@intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2016 22:03:30 -0000

Le 14 janv. 2016 22:39, "Wang, Zhihong" <zhihong.wang@intel.com> a =C3=A9cr=
it :
>
>
>
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Friday, January 15, 2016 12:49 AM
> > To: Wang, Zhihong <zhihong.wang@intel.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > Richardson, Bruce <bruce.richardson@intel.com>; Xie, Huawei
> > <huawei.xie@intel.com>
> > Subject: Re: [PATCH 0/4] Optimize memcpy for AVX512 platforms
> >
> > On Thu, 14 Jan 2016 01:13:18 -0500
> > Zhihong Wang <zhihong.wang@intel.com> wrote:
> >
> > > This patch set optimizes DPDK memcpy for AVX512 platforms, to make
full
> > > utilization of hardware resources and deliver high performance.
> > >
> > > In current DPDK, memcpy holds a large proportion of execution time in
> > > libs like Vhost, especially for large packets, and this patch can
bring
> > > considerable benefits.
> > >
> > > The implementation is based on the current DPDK memcpy framework, som=
e
> > > background introduction can be found in these threads:
> > > http://dpdk.org/ml/archives/dev/2014-November/008158.html
> > > http://dpdk.org/ml/archives/dev/2015-January/011800.html
> > >
> > > Code changes are:
> > >
> > >   1. Read CPUID to check if AVX512 is supported by CPU
> > >
> > >   2. Predefine AVX512 macro if AVX512 is enabled by compiler
> > >
> > >   3. Implement AVX512 memcpy and choose the right implementation base=
d
> > on
> > >      predefined macros
> > >
> > >   4. Decide alignment unit for memcpy perf test based on predefined
macros
> > >
> > > Zhihong Wang (4):
> > >   lib/librte_eal: Identify AVX512 CPU flag
> > >   mk: Predefine AVX512 macro for compiler
> > >   lib/librte_eal: Optimize memcpy for AVX512 platforms
> > >   app/test: Adjust alignment unit for memcpy perf test
> > >
> > >  app/test/test_memcpy_perf.c                        |   6 +
> > >  .../common/include/arch/x86/rte_cpuflags.h         |   2 +
> > >  .../common/include/arch/x86/rte_memcpy.h           | 247
> > ++++++++++++++++++++-
> > >  mk/rte.cpuflags.mk                                 |   4 +
> > >  4 files changed, 255 insertions(+), 4 deletions(-)
> > >
> >
> > This really looks like code that could benefit from Gcc
> > function multiversioning. The current cpuflags model is useless/flawed
> > in real product deployment
>
>
> I've tried gcc function multi versioning, with a simple add() function
> which returns a + b, and a loop calling it for millions of times. Turned
> out this mechanism adds 17% extra time to execute, overall it's a lot
> of extra overhead.
>
> Quote the gcc wiki: "GCC takes care of doing the dispatching to call
> the right version at runtime". So it loses inlining and adds extra
> dispatching overhead.
>
> Also this mechanism works only for C++, right?
>
> I think using predefined macros at compile time is more efficient and
> suits DPDK more.
>

I agree with you: performance first.

So having a mix of runtime and compile time would work. For those who are
ok with some performance drops, they can go with runtime.

> Could you please give an example when the current CPU flags model
> stop working? So I can fix it.
>