From: Neil Horman <nhorman@tuxdriver.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features
Date: Thu, 31 Jul 2014 15:01:17 -0400 [thread overview]
Message-ID: <20140731190117.GD20718@hmsreliant.think-freely.org> (raw)
In-Reply-To: <20140731183631.GC6420@localhost.localdomain>
On Thu, Jul 31, 2014 at 11:36:32AM -0700, Bruce Richardson wrote:
> Thu, Jul 31, 2014 at 02:10:32PM -0400, Neil Horman wrote:
> > On Thu, Jul 31, 2014 at 10:32:28AM -0400, Neil Horman wrote:
> > > On Thu, Jul 31, 2014 at 03:26:45PM +0200, Thomas Monjalon wrote:
> > > > 2014-07-31 09:13, Neil Horman:
> > > > > On Wed, Jul 30, 2014 at 02:09:20PM -0700, Bruce Richardson wrote:
> > > > > > On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote:
> > > > > > > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote:
> > > > > > > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote:
> > > > > > > > > Hey all-
> > > > > > > > > I've been trying to update the fedora dpdk package to support VFIO
> > > > > > > > > enabled drivers and ran into a problem in which ixgbe didn't compile because the
> > > > > > > > > rxtx_vec code uses sse4.2 instruction intrinsics, which aren't supported in the
> > > > > > > > > default config I have. I tried to remedy this by replacing the intrinsics with
> > > > > > > > > the __builtin macros, but it was pointed out (correctly), that this doesn't work
> > > > > > > > > properly. So this is my second attempt, which I actually like a bit better. I
> > > > > > > > > noted that code that uses intrinsics (ixgbe and the acl library), don't need to
> > > > > > > > > have those instructions turned on build-wide. Rather, we can just enable the
> > > > > > > > > instructions in the specific code we want to build with support for that, and
> > > > > > > > > test for instruction support dynamically at run time. This allows me to build
> > > > > > > > > the dpdk for a generic platform, but in such a way that some optimizations can
> > > > > > > > > be used if the executing cpu supports them at run time.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > > > > > > > CC: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > > > > > >
> > > > > > > > I'd prefer if a solution could be found based off your original patch
> > > > > > > > set, as it gives us more chance to deprecate the older code paths in
> > > > > > > > future. Looking at the Intel Intrinsics Guide site online, it shows that
> > > > > > > > the _mm_shuffle_epi8 intrinsic came in with SSSE3, rather than SSE4.x,
> > > > > > > > and so should be available on all 64-bit systems, I believe. The
> > > > > > > > popcount intrinsic is newer, but it's a much more basic instruction so
> > > > > > > > hopefully the __builtin should work for that.
> > > > > > > >
> > > > > > > Yes, but as I look at it, thats somewhat counter to my goal, which is to offer
> > > > > > > accelerated code paths on systems that can make use of it at run time. If We
> > > > > > > use the __builtin compiler functions, we will either:
> > > > > > >
> > > > > > > 1) Build those code paths with advanced instructions that won't work on older
> > > > > > > systems (i.e. crash)
> > > > > > >
> > > > > > > 2) Build those code paths with less advanced instructions, meaning that we won't
> > > > > > > speedup execution on systems that are capable of using the more advanced
> > > > > > > instructions.
> > > > > > >
> > > > > > > Using this run time check, we can, at least in these situations, make use of the
> > > > > > > accelerated paths when the instructions are available, and ignore them when
> > > > > > > they're not, at run time.
> > > > > > >
> > > > > > > What would be ideal, would be an alternative type macro, like the linux kernel
> > > > > > > employs, but implementing that would require some pretty significant work and
> > > > > > > testing. This seems like a much simpler approach.
> > > >
> > > > [...]
> > > >
> > > > > Now, a macro that selected an instruction optimized or generic path is fine, as
> > > > > long as it can happen at run time. The Linux kernel has such a feature, called
> > > > > alternatives. But its a complex subsystem that does run time replacement of
> > > > > instructions based on cpu feature flags. It would be great to have in the DPDK,
> > > > > but its a significant code base and difficult to maintain, which goes against
> > > > > your desire to reduce code.
> > > >
> > > > [...]
> > > >
> > > > > > Even though the code is written using intrinsics which correspond to SSE
> > > > > > operations, the compiler is free to use AVX instructions where necessary
> > > > > Not if you use the default machine target.
> > > > >
> > > > > > to improve performance. Therefore, if we go down this road, we need to
> > > > > > look to compile up the code for all microarchitectures, rather than just
> > > > > > assuming that we will get equivalent performance to "native" by turning
> > > > > > on the instruction set indicated by the primitives in the code. This is
> > > > > No, you compile for the least common demonitor system, and enable more
> > > > > performant paths opportunistically as run time checks allow.
> > > > >
> > > > > > where having one codepath recompiled multiple times will work far better
> > > > > > than having multiple code paths.
> > > > > Only if you're only concern is performance. As noted above, my goal is more
> > > > > than just performance, its compatibility accross systems. Multiple builds for
> > > > > multiple cpu flag availability is simply a non-starter for a generic
> > > > > distribution.
> > > >
> > > > Neil, we are mixing 2 different problems here.
> > > > 1) we have to fix default build (without SSE-4.2)
> > > Thats nothing to fix, thats a configuration issue. Just build for a lesser
> > > machine. I've already done that in the fedora build, using the defalut machine
> > > target. What exactly is missing from that?
> > >
> > Re-reading this, I'm wondering if I missed what you were trying to say, if so I
> > apologize. Were you trying to assert that the right thing to do here is to
> > adjust the ixgbe and acl code paths to not use the sse4.2 intrinsics so that
> > they are buildable on the default platform? If so, I agree, thats a nice idea,
> > and am supportive of it, though I don't think that fully solves teh problem. In
> > the case of the ixgbe pmd, what we have is 2 code paths, a generic code path,
> > and an optimized code path using sse4.2 intrinsics. In this case, I don't think
> > theres anything to fix, in that I'm fine with the optimized path needing sse4.2
> > to execute. There I just want to be able to do a run time check and use the
> > optimized path if the cpu supports it, and just use the default path otherwise.
> > In effect we already have exactly what you are looking for there.
> >
> > As far as the ACL library goes, yes, thats more complex. The use of sse4.2
> > intrinsics there is done througout the code, so theres no easy way to select a
> > path. we're just left with either using the code or returning an error at run
> > time, as my patch does. Certainly we can build some macros that either use the
> > intrinsics for sse4.2 or code up some C-level variants of those instructions
> > based on generic code, and build for the least common demoniator, or compile the
> > code twice (once without sse4.2 support, and once with), and do a runtime
> > selection between the two. Either way, thats going to be a useful, though
> > significant effort.
>
> I think a good first step here that I can't see anyone objecting to is
> to enable the ixgbe driver to use the vector code path for a generic
> x86_64 build. I've run a quick test here, and changing "_mm_popcnt_u64"
> to "__builtin_popcountll" [and the include from nmmintrin to tmmintrin]
> allows a compile for machine type default, and testpmd can still forward
> packets at a good rate (roughly perf down about 10% vs native compile on
> SNB).
> The ACL is a tougher nut to crack, but anyone see any issues with that
> two-line change to ixgbe_rxtx_vec.c? [Neil, since you started the patch
> set thread, do you want to submit an official patch here, or would you prefer I
> do so?]
>
I'm happy to do so, Though 10% performance degradation vs. using the sse4.2
instructions in that path seems significant, isn't it? Given that performance
delta, it seems like it would still be preferable to have a path that used the
sse4.2 instructions when they're available. Or am I misreading what you mean
when you say down 10%
Neil
> >
> > > > 2) we could try to have performance with default build
> > > >
> > > Yes, we can, thats what this patch does. It doesn't address every code path,
> > > no, but it addresses two paths that are low hanging fruit for doing so, and we
> > > can incrementally build on that
> > >
> > > > Please, let's focus on the first item and we could discuss about performance
> > > > later. Having some different code path choosed at runtime is a big rework and
> > > > imply changing the compilation model (RFC welcome).
> > > >
> > Even if I misinterpreted your statement above, I'm still not sure why your
> > asserting this. Fixing the build to work with the default target machine is
> > good, and should be undertaken, and I'll happily do so, but why reject the
> > solution in front of you to wait for it? Even if I write macros to fix up the
> > ACL library, I'd still like to be able to do a run time check and select the
> > optimized version or the generic version based on cpu support. Just doing a
> > compile time check to determine if sse4.2 is available really isn't going to cut
> > it for me, as I don't want the fedora dpdk to have pessimal performance if it
> > doesn't have to.
> >
> > Regards
> > Neil
> >
>
> With regards to the general approach for runtime detection of software
> functions, I wonder if something like this can be handled by the
> packaging system? Is it possible to ship out a set of shared libs
> compiled up for different instruction sets, and then at rpm install
> time, symlink the appropriate library? This would push the whole issue
> of detection of code paths outside of code, work across all our
> libraries and ensure each user got the best performance they could get
> form a binary?
> Has something like this been done before? The building of all the
> libraries could be scripted easy enough, just do multiple builds using
> different EXTRA_CFLAGS each time, and move and rename the .so's after
> each run.
>
> /Bruce
>
next prev parent reply other threads:[~2014-07-31 18:59 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-29 20:24 Neil Horman
2014-07-29 20:24 ` [dpdk-dev] [PATCH 1/2] ixgbe: test sse4.2 support at runtime for vectorized receive operations Neil Horman
2014-07-29 20:24 ` [dpdk-dev] [PATCH 2/2] acl: Preform dynamic sse4.2 support check Neil Horman
2014-07-30 12:07 ` [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features Ananyev, Konstantin
2014-07-30 13:01 ` Neil Horman
2014-07-30 13:44 ` Ananyev, Konstantin
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 " Neil Horman
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 1/2] ixgbe: test sse4.2 support at runtime for vectorized receive operations Neil Horman
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 2/2] acl: Preform dynamic sse4.2 support check Neil Horman
2014-07-30 15:36 ` [dpdk-dev] [PATCH v2 0/2] dpdk: Allow for dynamic enablement of some isolated features Ananyev, Konstantin
2014-07-30 19:03 ` Venky Venkatesan
2014-07-30 19:17 ` Neil Horman
2014-07-30 19:34 ` Neil Horman
2014-07-30 18:59 ` [dpdk-dev] [PATCH " Bruce Richardson
2014-07-30 19:28 ` Neil Horman
2014-07-30 21:09 ` Bruce Richardson
2014-07-31 9:30 ` Thomas Monjalon
2014-07-31 11:36 ` Ananyev, Konstantin
2014-07-31 13:13 ` Neil Horman
2014-07-31 13:26 ` Thomas Monjalon
2014-07-31 14:32 ` Neil Horman
2014-07-31 18:10 ` Neil Horman
2014-07-31 18:36 ` Bruce Richardson
2014-07-31 19:01 ` Neil Horman [this message]
2014-07-31 20:19 ` Bruce Richardson
2014-08-01 13:36 ` Neil Horman
2014-08-01 13:56 ` Ananyev, Konstantin
2014-08-01 14:26 ` Venkatesan, Venky
2014-08-01 14:27 ` Neil Horman
2014-07-31 19:58 ` John W. Linville
2014-07-31 20:20 ` Bruce Richardson
2014-07-31 20:32 ` John W. Linville
2014-08-01 8:46 ` Vincent JARDIN
2014-08-01 14:06 ` Neil Horman
2014-08-01 14:57 ` Vincent JARDIN
2014-08-01 15:19 ` Neil Horman
2014-07-31 20:10 ` Neil Horman
2014-07-31 20:25 ` Bruce Richardson
2014-08-01 15:06 ` Neil Horman
2014-08-01 19:22 ` Bruce Richardson
2014-08-01 20:43 ` Neil Horman
2014-08-01 21:08 ` Bruce Richardson
2014-08-02 12:56 ` Neil Horman
2014-07-31 21:53 ` Thomas Monjalon
2014-07-31 21:25 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140731190117.GD20718@hmsreliant.think-freely.org \
--to=nhorman@tuxdriver.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).