From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 6A8BEB368 for ; Thu, 31 Jul 2014 13:35:27 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP; 31 Jul 2014 04:30:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,771,1400050800"; d="scan'208";a="569912725" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by fmsmga001.fm.intel.com with ESMTP; 31 Jul 2014 04:36:36 -0700 Received: from irsmsx106.ger.corp.intel.com (163.33.3.31) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 31 Jul 2014 12:36:36 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.65]) by IRSMSX106.ger.corp.intel.com ([169.254.8.71]) with mapi id 14.03.0123.003; Thu, 31 Jul 2014 12:36:35 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" , Neil Horman Thread-Topic: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features Thread-Index: AQHPrCkEWg80nU/ka0Om4cnjJAzT6pu4794AgAAcGwCAAPZvcA== Date: Thu, 31 Jul 2014 11:36:34 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725821345B53@IRSMSX105.ger.corp.intel.com> References: <1406665466-29654-1-git-send-email-nhorman@tuxdriver.com> <20140730185902.GA6420@localhost.localdomain> <20140730192844.GB3296@hmsreliant.think-freely.org> <20140730210920.GB6420@localhost.localdomain> In-Reply-To: <20140730210920.GB6420@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jul 2014 11:35:28 -0000 Hi Bruce, > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > Sent: Wednesday, July 30, 2014 10:09 PM > To: Neil Horman > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of= some isolated features >=20 > On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote: > > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote: > > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote: > > > > Hey all- > > > > I've been trying to update the fedora dpdk package to suppo= rt VFIO > > > > enabled drivers and ran into a problem in which ixgbe didn't compil= e because the > > > > rxtx_vec code uses sse4.2 instruction intrinsics, which aren't supp= orted in the > > > > default config I have. I tried to remedy this by replacing the int= rinsics with > > > > the __builtin macros, but it was pointed out (correctly), that this= doesn't work > > > > properly. So this is my second attempt, which I actually like a bi= t better. I > > > > noted that code that uses intrinsics (ixgbe and the acl library), d= on't need to > > > > have those instructions turned on build-wide. Rather, we can just = enable the > > > > instructions in the specific code we want to build with support for= that, and > > > > test for instruction support dynamically at run time. This allows = me to build > > > > the dpdk for a generic platform, but in such a way that some optimi= zations can > > > > be used if the executing cpu supports them at run time. > > > > > > > > Signed-off-by: Neil Horman > > > > CC: Thomas Monjalon > > > > > > > I'd prefer if a solution could be found based off your original patch > > > set, as it gives us more chance to deprecate the older code paths in > > > future. Looking at the Intel Intrinsics Guide site online, it shows t= hat > > > the _mm_shuffle_epi8 intrinsic came in with SSSE3, rather than SSE4.x= , > > > and so should be available on all 64-bit systems, I believe. The > > > popcount intrinsic is newer, but it's a much more basic instruction s= o > > > hopefully the __builtin should work for that. > > > > > Yes, but as I look at it, thats somewhat counter to my goal, which is t= o offer > > accelerated code paths on systems that can make use of it at run time. = If We > > use the __builtin compiler functions, we will either: > > > > 1) Build those code paths with advanced instructions that won't work on= older > > systems (i.e. crash) > > > > 2) Build those code paths with less advanced instructions, meaning that= we won't > > speedup execution on systems that are capable of using the more advance= d > > instructions. > > > > Using this run time check, we can, at least in these situations, make u= se of the > > accelerated paths when the instructions are available, and ignore them = when > > they're not, at run time. > > > > What would be ideal, would be an alternative type macro, like the linux= kernel > > employs, but implementing that would require some pretty significant wo= rk and > > testing. This seems like a much simpler approach. > > >=20 > Ok, I understand where you are coming from indeed. However, within that, > I'd like to see us reduce the amount of code that's needed for > maintenance. >=20 > What we should really aim for, is to have common code, with perhaps some > small ifdefs or __builtins, and then compile that code multiple times > for multiple different architectures. So in this case, it would be nice > to use the __builtin, and then compile that code up with and without SSE > and select at runtime the code path to be used. Ideally, this could be > done at the driver level. >=20 > However, once you get down this path, you are dealing with more than > just SSE. If I compile up the PMD on my system, which has a chip based > on Sandy Bridge uarch, I find that there are multiple instructions > starting with "vp" which means that they are actually AVX instructions. > Even though the code is written using intrinsics which correspond to SSE > operations, the compiler is free to use AVX instructions where necessary > to improve performance.=20 > Therefore, if we go down this road, we need to > look to compile up the code for all microarchitectures, rather than just > assuming that we will get equivalent performance to "native" by turning > on the instruction set indicated by the primitives in the code. This is > where having one codepath recompiled multiple times will work far better > than having multiple code paths. Using your example - as long as we specify '-mavx' compiler can (and does) = use AVX instructions even for 'scalar' code (code without any SIMD instrincts). And yes, that probably affects performance. So, as I understand your suggestion, we'll then need to divide our code int= o: - generic one - compiled to run on all supported platforms =20 - performance critical that will be recompiled for each supported platform. Then generic code would have to make decision at run-time what particular v= ersion of recompiled code to use. And that for each PMD and all others performance-critical DPDK libraries. Looks like too much hassle to me. After all - if someone needs a package with binaries optimised for differen= t architectures, he can provide multiple DPDK binaries (build for different architectures) a= nd small install script, that would decide which binary is more appropriate for the given platform. Konstantin =20