DPDK patches and discussions
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
Date: Wed, 6 Aug 2014 08:18:59 -0400	[thread overview]
Message-ID: <20140806121859.GB26562@localhost.localdomain> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772582134FC92@IRSMSX105.ger.corp.intel.com>

On Wed, Aug 06, 2014 at 11:39:22AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Tuesday, August 05, 2014 7:21 PM
> > To: Ananyev, Konstantin
> > Cc: dev@dpdk.org; Thomas Monjalon; Richardson, Bruce
> > Subject: Re: [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
> > 
> > On Tue, Aug 05, 2014 at 03:26:27PM +0000, Ananyev, Konstantin wrote:
> > > Hi Neil,
> > >
> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > Sent: Monday, August 04, 2014 4:36 PM
> > > > To: dev@dpdk.org
> > > > Cc: Neil Horman; Thomas Monjalon; Ananyev, Konstantin; Richardson, Bruce
> > > > Subject: [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code
> > > >
> > > > The ACL library makes extensive use of some SSE4.2 instructions, which means the
> > > > default build can't compile this library.  Work around the problem by testing
> > > > the __SSE42__ definition in the acl_vects.h file and defining the macros there
> > > > as intrinsics or c-level equivalants.  Note this is a minimal patch, adjusting
> > > > only the definitions that are currently used in the ACL library.
> > > >
> > >
> > > My comments about actual implementations of c-level equivalents below.
> > > None of them look correct to me.
> > > Of course it could be fixed.
> > > Though I am not sure that is a right way to proceed:
> > > At first I really doubt that these equivalents will provide similar performance.
> > > As you probably note - we do have a scalar version of  rte_acl_classify():  rte_acl_classify_scalar().
> > > So I think it might be faster than vector one with 'emulated' instrincts.
> > > Unfortunately it is all mixed right now into one file and 'scalar' version could use few sse4 instrincts through resolve_priority().
> > > Another thing - we consider to add another version of rte_acl_classify() that will use avx2 instrincts.
> > > If we go the way you suggest - I am afraid will soon have to provide scalar equivalents for several AVX2 instrincts too.
> > > So in summary that way (providing our own scalar equivalents of SIMD instrincts) seems to me slow, hard to maintain and error
> > prone.
> > >
> > > What porbably can be done instead: rework acl_run.c a bit, so it provide:
> > > rte_acl_classify_scalar() - could be build and used on all systems.
> > > rte_acl_classify_sse() - could be build and used only on systems with sse4.2 and upper, return ENOTSUP on lower arch.
> > > In future: rte_acl_classify_avx2 - could be build and used only on systems with avx2 and upper, return ENOTSUP on lower arch.
> > >
> > > I am looking at rte_acl right now anyway.
> > > So will try to come up with something workable.
> > >
> > So, this is exactly the opposite of what Bruce and I just spent several days and
> > a huge email thread that you clearly are aware of discussing run time versus
> > compile time selection of paths.  At this point I'm done ping ponging between
> > your opposing viewpoints.  If you want to implement something that does run time
> > checking, I'm fine with it, but I'm not going back and forth until you two come
> > to an agreement on this.
> 
> Right now, I am not talking about 'run time vs compile time selection'.
But you are talking about exactly that, allbeit implicitly.  To implement what
you recommend above (that being multiple functional paths that return a not
supported error code at run time), we need to make run time tests for what the
cpu supports.  While I'm actually ok with doing that (I think it makes alot of
sense), Bruce and I just spent several days and dozens of emails debating that,
so you can understand why I don't want to write yet another version of this
patch that requires doing the exact thing we just argued about, especially if it
means he's going to pipe back up and say no, driving me back to a common single
implementation that compiles and runs for all platforms.  I'm not going to keep
re-writing this boucing back and forth between your opposing viewpoints.  We
need to agree on a direction before I make another pass at this.

> Whatever way we choose, I think the implementation need to be:
> 1) correct
Obviously.

> 2) allow easily add(/modify) code path optimised for particular architecture.
> Without need to modify/re-test what you call  'least common denominator' code path.
> And visa-versa, if someone find a way to optimise common code path - no need to
> touch/retest architecture specific ones.
So I'm fine with this, but it is anathema to what Bruce advocated for when I did
this latest iteration.  Bruce advocated for a single common path that compiled
in all cases.  Bruce, do you want to comment here?  I'd really like to get this
settled before I go try this again. 

Neil

> 

  reply	other threads:[~2014-08-06 12:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-04 15:35 Neil Horman
2014-08-05 15:26 ` Ananyev, Konstantin
2014-08-05 18:20   ` Neil Horman
2014-08-06 10:52     ` Ananyev, Konstantin
2014-08-06 12:12       ` Neil Horman
2014-08-06 12:23         ` Ananyev, Konstantin
2014-08-06 13:35           ` Neil Horman
2014-08-06 11:39     ` Ananyev, Konstantin
2014-08-06 12:18       ` Neil Horman [this message]
2014-08-06 12:26         ` Ananyev, Konstantin
2014-08-06 16:59         ` Richardson, Bruce
2014-08-06 17:27           ` Neil Horman
2014-08-12 23:19             ` Thomas Monjalon
2014-08-13 12:33               ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140806121859.GB26562@localhost.localdomain \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).