From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 6A8BEB368
 for <dev@dpdk.org>; Thu, 31 Jul 2014 13:35:27 +0200 (CEST)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga103.fm.intel.com with ESMTP; 31 Jul 2014 04:30:30 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.01,771,1400050800"; d="scan'208";a="569912725"
Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153])
 by fmsmga001.fm.intel.com with ESMTP; 31 Jul 2014 04:36:36 -0700
Received: from irsmsx106.ger.corp.intel.com (163.33.3.31) by
 IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS)
 id 14.3.123.3; Thu, 31 Jul 2014 12:36:36 +0100
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.65]) by
 IRSMSX106.ger.corp.intel.com ([169.254.8.71]) with mapi id 14.03.0123.003;
 Thu, 31 Jul 2014 12:36:35 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Richardson, Bruce" <bruce.richardson@intel.com>, Neil Horman
 <nhorman@tuxdriver.com>
Thread-Topic: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of
 some isolated features
Thread-Index: AQHPrCkEWg80nU/ka0Om4cnjJAzT6pu4794AgAAcGwCAAPZvcA==
Date: Thu, 31 Jul 2014 11:36:34 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725821345B53@IRSMSX105.ger.corp.intel.com>
References: <1406665466-29654-1-git-send-email-nhorman@tuxdriver.com>
 <20140730185902.GA6420@localhost.localdomain>
 <20140730192844.GB3296@hmsreliant.think-freely.org>
 <20140730210920.GB6420@localhost.localdomain>
In-Reply-To: <20140730210920.GB6420@localhost.localdomain>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of
 some isolated features
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jul 2014 11:35:28 -0000


Hi Bruce,

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, July 30, 2014 10:09 PM
> To: Neil Horman
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of=
 some isolated features
>=20
> On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote:
> > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote:
> > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote:
> > > > Hey all-
> > > >         I've been trying to update the fedora dpdk package to suppo=
rt VFIO
> > > > enabled drivers and ran into a problem in which ixgbe didn't compil=
e because the
> > > > rxtx_vec code uses sse4.2 instruction intrinsics, which aren't supp=
orted in the
> > > > default config I have.  I tried to remedy this by replacing the int=
rinsics with
> > > > the __builtin macros, but it was pointed out (correctly), that this=
 doesn't work
> > > > properly.  So this is my second attempt, which I actually like a bi=
t better.  I
> > > > noted that code that uses intrinsics (ixgbe and the acl library), d=
on't need to
> > > > have those instructions turned on build-wide.  Rather, we can just =
enable the
> > > > instructions in the specific code we want to build with support for=
 that, and
> > > > test for instruction support dynamically at run time.  This allows =
me to build
> > > > the dpdk for a generic platform, but in such a way that some optimi=
zations can
> > > > be used if the executing cpu supports them at run time.
> > > >
> > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > > CC: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > >
> > > I'd prefer if a solution could be found based off your original patch
> > > set, as it gives us more chance to deprecate the older code paths in
> > > future. Looking at the Intel Intrinsics Guide site online, it shows t=
hat
> > > the _mm_shuffle_epi8 intrinsic came in with SSSE3, rather than SSE4.x=
,
> > > and so should be available on all 64-bit systems, I believe. The
> > > popcount intrinsic is newer, but it's a much more basic instruction s=
o
> > > hopefully the __builtin should work for that.
> > >
> > Yes, but as I look at it, thats somewhat counter to my goal, which is t=
o offer
> > accelerated code paths on systems that can make use of it at run time. =
 If We
> > use the __builtin compiler functions, we will either:
> >
> > 1) Build those code paths with advanced instructions that won't work on=
 older
> > systems (i.e. crash)
> >
> > 2) Build those code paths with less advanced instructions, meaning that=
 we won't
> > speedup execution on systems that are capable of using the more advance=
d
> > instructions.
> >
> > Using this run time check, we can, at least in these situations, make u=
se of the
> > accelerated paths when the instructions are available, and ignore them =
when
> > they're not, at run time.
> >
> > What would be ideal, would be an alternative type macro, like the linux=
 kernel
> > employs, but implementing that would require some pretty significant wo=
rk and
> > testing.  This seems like a much simpler approach.
> >
>=20
> Ok, I understand where you are coming from indeed. However, within that,
> I'd like to see us reduce the amount of code that's needed for
> maintenance.
>=20
> What we should really aim for, is to have common code, with perhaps some
> small ifdefs or __builtins, and then compile that code multiple times
> for multiple different architectures. So in this case, it would be nice
> to use the __builtin, and then compile that code up with and without SSE
> and select at runtime the code path to be used. Ideally, this could be
> done at the driver level.
>=20
> However, once you get down this path, you are dealing with more than
> just SSE. If I compile up the PMD on my system, which has a chip based
> on Sandy Bridge uarch, I find that there are multiple instructions
> starting with "vp" which means that they are actually AVX instructions.
> Even though the code is written using intrinsics which correspond to SSE
> operations, the compiler is free to use AVX instructions where necessary
> to improve performance.=20
> Therefore, if we go down this road, we need to
> look to compile up the code for all microarchitectures, rather than just
> assuming that we will get equivalent performance to "native" by turning
> on the instruction set indicated by the primitives in the code. This is
> where having one codepath recompiled multiple times will work far better
> than having multiple code paths.

Using your example - as long as we specify '-mavx' compiler can (and does) =
use AVX instructions
even for 'scalar' code (code without any SIMD instrincts).
And yes, that probably affects performance.
So, as I understand your suggestion, we'll then need to divide our code int=
o:
- generic one - compiled to run on all supported platforms  =20
- performance critical that will be recompiled for each supported platform.
Then generic code would have to make decision at run-time what particular v=
ersion of recompiled code to use.
And that for each PMD and all others performance-critical DPDK libraries.
Looks like too much hassle to me.
After all - if someone needs a package with binaries optimised for differen=
t architectures,
he can provide multiple DPDK binaries (build for different architectures) a=
nd small install script,
that would decide which binary is more appropriate for the given platform.

Konstantin
  =20