From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
Date: Wed, 6 Aug 2014 11:39:22 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772582134FC92@IRSMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <20140805182035.GB20550@hmsreliant.think-freely.org>
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, August 05, 2014 7:21 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org; Thomas Monjalon; Richardson, Bruce
> Subject: Re: [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
>
> On Tue, Aug 05, 2014 at 03:26:27PM +0000, Ananyev, Konstantin wrote:
> > Hi Neil,
> >
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Monday, August 04, 2014 4:36 PM
> > > To: dev@dpdk.org
> > > Cc: Neil Horman; Thomas Monjalon; Ananyev, Konstantin; Richardson, Bruce
> > > Subject: [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
> > >
> > > The ACL library makes extensive use of some SSE4.2 instructions, which means the
> > > default build can't compile this library. Work around the problem by testing
> > > the __SSE42__ definition in the acl_vects.h file and defining the macros there
> > > as intrinsics or c-level equivalants. Note this is a minimal patch, adjusting
> > > only the definitions that are currently used in the ACL library.
> > >
> >
> > My comments about actual implementations of c-level equivalents below.
> > None of them look correct to me.
> > Of course it could be fixed.
> > Though I am not sure that is a right way to proceed:
> > At first I really doubt that these equivalents will provide similar performance.
> > As you probably note - we do have a scalar version of rte_acl_classify(): rte_acl_classify_scalar().
> > So I think it might be faster than vector one with 'emulated' instrincts.
> > Unfortunately it is all mixed right now into one file and 'scalar' version could use few sse4 instrincts through resolve_priority().
> > Another thing - we consider to add another version of rte_acl_classify() that will use avx2 instrincts.
> > If we go the way you suggest - I am afraid will soon have to provide scalar equivalents for several AVX2 instrincts too.
> > So in summary that way (providing our own scalar equivalents of SIMD instrincts) seems to me slow, hard to maintain and error
> prone.
> >
> > What porbably can be done instead: rework acl_run.c a bit, so it provide:
> > rte_acl_classify_scalar() - could be build and used on all systems.
> > rte_acl_classify_sse() - could be build and used only on systems with sse4.2 and upper, return ENOTSUP on lower arch.
> > In future: rte_acl_classify_avx2 - could be build and used only on systems with avx2 and upper, return ENOTSUP on lower arch.
> >
> > I am looking at rte_acl right now anyway.
> > So will try to come up with something workable.
> >
> So, this is exactly the opposite of what Bruce and I just spent several days and
> a huge email thread that you clearly are aware of discussing run time versus
> compile time selection of paths. At this point I'm done ping ponging between
> your opposing viewpoints. If you want to implement something that does run time
> checking, I'm fine with it, but I'm not going back and forth until you two come
> to an agreement on this.
Right now, I am not talking about 'run time vs compile time selection'.
Whatever way we choose, I think the implementation need to be:
1) correct
2) allow easily add(/modify) code path optimised for particular architecture.
Without need to modify/re-test what you call 'least common denominator' code path.
And visa-versa, if someone find a way to optimise common code path - no need to
touch/retest architecture specific ones.
>
> > > Only compile tested so far, but I wanted to post it for early review so that
> > > others could aid in unit testing.
> >
> > Could you please stop submitting untested patches.
> What part of "Compile tested only" was hard to understand? I posted it early
> specficially because I expected comments like you've provided above. After I
> posted my first idea, you naked it, and after I started fixing it up to make
> run-time selections, Bruce came down in favor of common compile time
> functionality. Given our previous conversation, I figured you would be opposed
> to this approach, so whats the point in testing this if you're just going to
> kill the idea?
>
> > It is common (and good) practice to do at least some minimal testing before submitting your code changes.
> Never intended it to be accepted just tested/reviewed. Hence my comment
> "compile tested only". I admit I was unaware of the unit test below
> > At the end, it will save your own and other people time.
> > For ACL there is a simple UT, that could be run as:
> > ./x86_64-native-linuxapp-gcc/app/test ...
> > RTE>>acl_autotest
> > It takes just few seconds to run.
>
> It doesn't seem to work properly, at least not on any of my systems. With a
> system allocation 100 pages to hugepage memory:
>
> [root@hmsreliant app]# ./test -c 0x3 -n 2
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 0 on socket 0
> EAL: Detected lcore 5 as core 1 on socket 0
> EAL: Detected lcore 6 as core 2 on socket 0
> EAL: Detected lcore 7 as core 3 on socket 0
> EAL: Support maximum 64 logical core(s) by configuration.
> EAL: Detected 8 lcore(s)
> EAL: cannot open VFIO container, error 2 (No such file or directory)
> EAL: VFIO support could not be initialized
> EAL: Setting up memory...
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7fef07000000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7fef06c00000 (size = 0x200000)
> EAL: Ask a virtual area of 0x400000 bytes
> EAL: Virtual area found at 0x7fef06600000 (size = 0x400000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7fef06200000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7fef05e00000 (size = 0x200000)
> EAL: Ask a virtual area of 0x600000 bytes
> EAL: Virtual area found at 0x7fef05600000 (size = 0x600000)
> EAL: Ask a virtual area of 0x600000 bytes
> EAL: Virtual area found at 0x7fef04e00000 (size = 0x600000)
> EAL: Ask a virtual area of 0x800000 bytes
> EAL: Virtual area found at 0x7fef04400000 (size = 0x800000)
> EAL: Ask a virtual area of 0x400000 bytes
> EAL: Virtual area found at 0x7fef03e00000 (size = 0x400000)
> EAL: Ask a virtual area of 0xa00000 bytes
> EAL: Virtual area found at 0x7fef03200000 (size = 0xa00000)
> EAL: Ask a virtual area of 0x400000 bytes
> EAL: Virtual area found at 0x7fef02c00000 (size = 0x400000)
> EAL: Ask a virtual area of 0x400000 bytes
> EAL: Virtual area found at 0x7fef02600000 (size = 0x400000)
> EAL: Ask a virtual area of 0x1800000 bytes
> EAL: Virtual area found at 0x7fef00c00000 (size = 0x1800000)
> EAL: Ask a virtual area of 0x1200000 bytes
> EAL: Virtual area found at 0x7feeff800000 (size = 0x1200000)
> EAL: Ask a virtual area of 0x2600000 bytes
> EAL: Virtual area found at 0x7feefd000000 (size = 0x2600000)
> EAL: Ask a virtual area of 0xc00000 bytes
> EAL: Virtual area found at 0x7feefc200000 (size = 0xc00000)
> EAL: Ask a virtual area of 0x2200000 bytes
> EAL: Virtual area found at 0x7feef9e00000 (size = 0x2200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef9a00000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef9600000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef9200000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef8e00000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef8a00000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef8600000 (size = 0x200000)
> EAL: Ask a virtual area of 0x200000 bytes
> EAL: Virtual area found at 0x7feef8200000 (size = 0x200000)
> EAL: Ask a virtual area of 0x600000 bytes
> EAL: Virtual area found at 0x7feef7a00000 (size = 0x600000)
> EAL: Requesting 100 pages of size 2MB from socket 0
> EAL: TSC frequency is ~3392297 KHz
> EAL: Master core 0 is ready (tid=73cf880)
> EAL: Core 1 is ready (tid=f71fe700)
> APP: HPET is not enabled, using TSC as default timer
> RTE>>acl_autotest
> ACL: allocation of 25166720 bytes on socket 9 for ACL_acl_ctx failed
>
> This hangs forever (well, at least 30 minutes, but thats sufficiently
> close to forever for me to wait).
>
> > Also if you plan further development for ACL, we probably can provide you with extra test-cases that we use for functional testing.
> That would be good please.
>
> Thanks
> Neil
>
next prev parent reply other threads:[~2014-08-06 11:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-04 15:35 Neil Horman
2014-08-05 15:26 ` Ananyev, Konstantin
2014-08-05 18:20 ` Neil Horman
2014-08-06 10:52 ` Ananyev, Konstantin
2014-08-06 12:12 ` Neil Horman
2014-08-06 12:23 ` Ananyev, Konstantin
2014-08-06 13:35 ` Neil Horman
2014-08-06 11:39 ` Ananyev, Konstantin [this message]
2014-08-06 12:18 ` Neil Horman
2014-08-06 12:26 ` Ananyev, Konstantin
2014-08-06 16:59 ` Richardson, Bruce
2014-08-06 17:27 ` Neil Horman
2014-08-12 23:19 ` Thomas Monjalon
2014-08-13 12:33 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB9772582134FC92@IRSMSX105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=dev@dpdk.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).