From: Neil Horman <nhorman@tuxdriver.com>
To: Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features
Date: Thu, 31 Jul 2014 14:10:32 -0400 [thread overview]
Message-ID: <20140731181032.GC20718@hmsreliant.think-freely.org> (raw)
In-Reply-To: <20140731143228.GB20718@hmsreliant.think-freely.org>
On Thu, Jul 31, 2014 at 10:32:28AM -0400, Neil Horman wrote:
> On Thu, Jul 31, 2014 at 03:26:45PM +0200, Thomas Monjalon wrote:
> > 2014-07-31 09:13, Neil Horman:
> > > On Wed, Jul 30, 2014 at 02:09:20PM -0700, Bruce Richardson wrote:
> > > > On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote:
> > > > > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote:
> > > > > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote:
> > > > > > > Hey all-
> > > > > > > I've been trying to update the fedora dpdk package to support VFIO
> > > > > > > enabled drivers and ran into a problem in which ixgbe didn't compile because the
> > > > > > > rxtx_vec code uses sse4.2 instruction intrinsics, which aren't supported in the
> > > > > > > default config I have. I tried to remedy this by replacing the intrinsics with
> > > > > > > the __builtin macros, but it was pointed out (correctly), that this doesn't work
> > > > > > > properly. So this is my second attempt, which I actually like a bit better. I
> > > > > > > noted that code that uses intrinsics (ixgbe and the acl library), don't need to
> > > > > > > have those instructions turned on build-wide. Rather, we can just enable the
> > > > > > > instructions in the specific code we want to build with support for that, and
> > > > > > > test for instruction support dynamically at run time. This allows me to build
> > > > > > > the dpdk for a generic platform, but in such a way that some optimizations can
> > > > > > > be used if the executing cpu supports them at run time.
> > > > > > >
> > > > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > > > > > CC: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > > > >
> > > > > > I'd prefer if a solution could be found based off your original patch
> > > > > > set, as it gives us more chance to deprecate the older code paths in
> > > > > > future. Looking at the Intel Intrinsics Guide site online, it shows that
> > > > > > the _mm_shuffle_epi8 intrinsic came in with SSSE3, rather than SSE4.x,
> > > > > > and so should be available on all 64-bit systems, I believe. The
> > > > > > popcount intrinsic is newer, but it's a much more basic instruction so
> > > > > > hopefully the __builtin should work for that.
> > > > > >
> > > > > Yes, but as I look at it, thats somewhat counter to my goal, which is to offer
> > > > > accelerated code paths on systems that can make use of it at run time. If We
> > > > > use the __builtin compiler functions, we will either:
> > > > >
> > > > > 1) Build those code paths with advanced instructions that won't work on older
> > > > > systems (i.e. crash)
> > > > >
> > > > > 2) Build those code paths with less advanced instructions, meaning that we won't
> > > > > speedup execution on systems that are capable of using the more advanced
> > > > > instructions.
> > > > >
> > > > > Using this run time check, we can, at least in these situations, make use of the
> > > > > accelerated paths when the instructions are available, and ignore them when
> > > > > they're not, at run time.
> > > > >
> > > > > What would be ideal, would be an alternative type macro, like the linux kernel
> > > > > employs, but implementing that would require some pretty significant work and
> > > > > testing. This seems like a much simpler approach.
> >
> > [...]
> >
> > > Now, a macro that selected an instruction optimized or generic path is fine, as
> > > long as it can happen at run time. The Linux kernel has such a feature, called
> > > alternatives. But its a complex subsystem that does run time replacement of
> > > instructions based on cpu feature flags. It would be great to have in the DPDK,
> > > but its a significant code base and difficult to maintain, which goes against
> > > your desire to reduce code.
> >
> > [...]
> >
> > > > Even though the code is written using intrinsics which correspond to SSE
> > > > operations, the compiler is free to use AVX instructions where necessary
> > > Not if you use the default machine target.
> > >
> > > > to improve performance. Therefore, if we go down this road, we need to
> > > > look to compile up the code for all microarchitectures, rather than just
> > > > assuming that we will get equivalent performance to "native" by turning
> > > > on the instruction set indicated by the primitives in the code. This is
> > > No, you compile for the least common demonitor system, and enable more
> > > performant paths opportunistically as run time checks allow.
> > >
> > > > where having one codepath recompiled multiple times will work far better
> > > > than having multiple code paths.
> > > Only if you're only concern is performance. As noted above, my goal is more
> > > than just performance, its compatibility accross systems. Multiple builds for
> > > multiple cpu flag availability is simply a non-starter for a generic
> > > distribution.
> >
> > Neil, we are mixing 2 different problems here.
> > 1) we have to fix default build (without SSE-4.2)
> Thats nothing to fix, thats a configuration issue. Just build for a lesser
> machine. I've already done that in the fedora build, using the defalut machine
> target. What exactly is missing from that?
>
Re-reading this, I'm wondering if I missed what you were trying to say, if so I
apologize. Were you trying to assert that the right thing to do here is to
adjust the ixgbe and acl code paths to not use the sse4.2 intrinsics so that
they are buildable on the default platform? If so, I agree, thats a nice idea,
and am supportive of it, though I don't think that fully solves teh problem. In
the case of the ixgbe pmd, what we have is 2 code paths, a generic code path,
and an optimized code path using sse4.2 intrinsics. In this case, I don't think
theres anything to fix, in that I'm fine with the optimized path needing sse4.2
to execute. There I just want to be able to do a run time check and use the
optimized path if the cpu supports it, and just use the default path otherwise.
In effect we already have exactly what you are looking for there.
As far as the ACL library goes, yes, thats more complex. The use of sse4.2
intrinsics there is done througout the code, so theres no easy way to select a
path. we're just left with either using the code or returning an error at run
time, as my patch does. Certainly we can build some macros that either use the
intrinsics for sse4.2 or code up some C-level variants of those instructions
based on generic code, and build for the least common demoniator, or compile the
code twice (once without sse4.2 support, and once with), and do a runtime
selection between the two. Either way, thats going to be a useful, though
significant effort.
> > 2) we could try to have performance with default build
> >
> Yes, we can, thats what this patch does. It doesn't address every code path,
> no, but it addresses two paths that are low hanging fruit for doing so, and we
> can incrementally build on that
>
> > Please, let's focus on the first item and we could discuss about performance
> > later. Having some different code path choosed at runtime is a big rework and
> > imply changing the compilation model (RFC welcome).
> >
Even if I misinterpreted your statement above, I'm still not sure why your
asserting this. Fixing the build to work with the default target machine is
good, and should be undertaken, and I'll happily do so, but why reject the
solution in front of you to wait for it? Even if I write macros to fix up the
ACL library, I'd still like to be able to do a run time check and select the
optimized version or the generic version based on cpu support. Just doing a
compile time check to determine if sse4.2 is available really isn't going to cut
it for me, as I don't want the fedora dpdk to have pessimal performance if it
doesn't have to.
Regards
Neil
next prev parent reply other threads:[~2014-07-31 18:08 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-29 20:24 Neil Horman
2014-07-29 20:24 ` [dpdk-dev] [PATCH 1/2] ixgbe: test sse4.2 support at runtime for vectorized receive operations Neil Horman
2014-07-29 20:24 ` [dpdk-dev] [PATCH 2/2] acl: Preform dynamic sse4.2 support check Neil Horman
2014-07-30 12:07 ` [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features Ananyev, Konstantin
2014-07-30 13:01 ` Neil Horman
2014-07-30 13:44 ` Ananyev, Konstantin
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 " Neil Horman
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 1/2] ixgbe: test sse4.2 support at runtime for vectorized receive operations Neil Horman
2014-07-30 14:49 ` [dpdk-dev] [PATCH v2 2/2] acl: Preform dynamic sse4.2 support check Neil Horman
2014-07-30 15:36 ` [dpdk-dev] [PATCH v2 0/2] dpdk: Allow for dynamic enablement of some isolated features Ananyev, Konstantin
2014-07-30 19:03 ` Venky Venkatesan
2014-07-30 19:17 ` Neil Horman
2014-07-30 19:34 ` Neil Horman
2014-07-30 18:59 ` [dpdk-dev] [PATCH " Bruce Richardson
2014-07-30 19:28 ` Neil Horman
2014-07-30 21:09 ` Bruce Richardson
2014-07-31 9:30 ` Thomas Monjalon
2014-07-31 11:36 ` Ananyev, Konstantin
2014-07-31 13:13 ` Neil Horman
2014-07-31 13:26 ` Thomas Monjalon
2014-07-31 14:32 ` Neil Horman
2014-07-31 18:10 ` Neil Horman [this message]
2014-07-31 18:36 ` Bruce Richardson
2014-07-31 19:01 ` Neil Horman
2014-07-31 20:19 ` Bruce Richardson
2014-08-01 13:36 ` Neil Horman
2014-08-01 13:56 ` Ananyev, Konstantin
2014-08-01 14:26 ` Venkatesan, Venky
2014-08-01 14:27 ` Neil Horman
2014-07-31 19:58 ` John W. Linville
2014-07-31 20:20 ` Bruce Richardson
2014-07-31 20:32 ` John W. Linville
2014-08-01 8:46 ` Vincent JARDIN
2014-08-01 14:06 ` Neil Horman
2014-08-01 14:57 ` Vincent JARDIN
2014-08-01 15:19 ` Neil Horman
2014-07-31 20:10 ` Neil Horman
2014-07-31 20:25 ` Bruce Richardson
2014-08-01 15:06 ` Neil Horman
2014-08-01 19:22 ` Bruce Richardson
2014-08-01 20:43 ` Neil Horman
2014-08-01 21:08 ` Bruce Richardson
2014-08-02 12:56 ` Neil Horman
2014-07-31 21:53 ` Thomas Monjalon
2014-07-31 21:25 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140731181032.GC20718@hmsreliant.think-freely.org \
--to=nhorman@tuxdriver.com \
--cc=dev@dpdk.org \
--cc=thomas.monjalon@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).