DPDK patches and discussions
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
Date: Wed, 17 Dec 2014 15:27:43 -0500	[thread overview]
Message-ID: <20141217202743.GA10240@hmsreliant.think-freely.org> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258213C1AB4@IRSMSX105.ger.corp.intel.com>

On Wed, Dec 17, 2014 at 07:22:06PM +0000, Ananyev, Konstantin wrote:
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Wednesday, December 17, 2014 3:33 PM
> > To: Ananyev, Konstantin
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > 
> > On Tue, Dec 16, 2014 at 04:16:48PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > Sent: Monday, December 15, 2014 8:21 PM
> > > > To: Ananyev, Konstantin
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > > >
> > > > On Mon, Dec 15, 2014 at 04:33:47PM +0000, Ananyev, Konstantin wrote:
> > > > > Hi Neil,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > > Sent: Monday, December 15, 2014 4:00 PM
> > > > > > To: Ananyev, Konstantin
> > > > > > Cc: dev@dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > > > > >
> > > > > > On Sun, Dec 14, 2014 at 06:10:52PM +0000, Konstantin Ananyev wrote:
> > > > > > > Introduce new classify() method that uses AVX2 instructions.
> > > > > > > From my measurements:
> > > > > > > On HSW boards when processing >= 16 packets per call,
> > > > > > > AVX2 method outperforms it's SSE counterpart by 10-25%,
> > > > > > > (depending on the ruleset).
> > > > > > > At runtime, this method is selected as default one on HW that supports AVX2.
> > > > > > >
> > > > > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > > > > ---
> > > > > > >  lib/librte_acl/Makefile       |   9 +
> > > > > > >  lib/librte_acl/acl.h          |   4 +
> > > > > > >  lib/librte_acl/acl_run.h      |   2 +-
> > > > > > >  lib/librte_acl/acl_run_avx2.c |  58 +++++
> > > > > > >  lib/librte_acl/acl_run_avx2.h | 305 ++++++++++++++++++++++++
> > > > > > >  lib/librte_acl/acl_run_sse.c  | 537 +-----------------------------------------
> > > > > > >  lib/librte_acl/acl_run_sse.h  | 533 +++++++++++++++++++++++++++++++++++++++++
> > > > > > >  lib/librte_acl/rte_acl.c      |   5 +-
> > > > > > >  lib/librte_acl/rte_acl.h      |   2 +
> > > > > > >  9 files changed, 917 insertions(+), 538 deletions(-)
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_avx2.c
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_avx2.h
> > > > > > >  create mode 100644 lib/librte_acl/acl_run_sse.h
> > > > > > >
> > > > > > > diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
> > > > > > > index 65e566d..223ec31 100644
> > > > > > > --- a/lib/librte_acl/Makefile
> > > > > > > +++ b/lib/librte_acl/Makefile
> > > > > > > @@ -45,8 +45,17 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
> > > > > > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c
> > > > > > > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_avx2.c
> > > > > > >
> > > > > > >  CFLAGS_acl_run_sse.o += -msse4.1
> > > > > > > +ifeq ($(CC), icc)
> > > > > > > +CFLAGS_acl_run_avx2.o += -march=core-avx2
> > > > > > > +else ifneq ($(shell \
> > > > > > > +test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 6 && echo 1), 1)
> > > > > > > +CFLAGS_acl_run_avx2.o += -mavx2
> > > > > > > +else
> > > > > > > +CFLAGS_acl_run_avx2.o += -msse4.1
> > > > > > > +endif
> > > > > > >
> > > > > > This seems broken.  You've unilaterally included acl_run_avx2.c in the build
> > > > > > list above, but only enable -mavx2 if the compiler is at least gcc 4.6.
> > > > >
> > > > > Actually 4.7 (before that version, as I know,  gcc doesn't support avx2)
> > > > >
> > > > > >  Unless
> > > > > > you want to make gcc 4.6 a requirement for building,
> > > > >
> > > > > I believe DPDK is required to be buildable by gcc 4.6
> > > > > As I remember, we have to support it all way down to gcc 4.3.
> > > > >
> > > > > > you need to also exclude
> > > > > > the file above from the build list.
> > > > >
> > > > > That means that for  gcc 4.6 and below rte_acl_classify_avx2() would not be defined.
> > > > > And then at runtime, I have to check for that somehow and (re)populate classify_fns[].
> > > > > Doesn't seems like a good way to me.
> > > > There are plenty of ways around that.
> > > >
> > > > At a minimum you could make the classify_fns array the one place that you need
> > > > to add an ifdef __AVX__ call.
> > > >
> > > > You could also create a secondary definition of rte_acl_classify_avx2, and mark
> > > > it as a weak symbol, which only returns -EOPNOTSUPP.  That would be good, since
> > > > the right thing will just automatically happen then if you don't build the
> > > > actual avx2 classification code
> > > >
> > > > > Instead, I prefer to always build acl_run_avx2.c,
> > >
> > >
> > > > But you can't do that.  You just said above that you need to support down to gcc
> > > > 4.3.  I see you've worked around that with some additional ifdef __AVX__
> > > > instructions, but in so doing you ignore the possibiity that sse isn't
> > > > supported, so you need to add __SSE__ checks now as well.  ifdeffing that much
> > > > just isn't scalable.
> > >
> > > We don't need to worry about compiler without SSE4.1 support.
> > > I believe that all compilers that DDPDK has to build with, do support SSE4.1.
> > > So for SSE4.1 we only has to worry about situation when target CPU doesn't support it
> > > We manage it by runtime selection.
> > > For AVX2 - situation is a bit different: it could be both compiler and target CPU that don't support it.
> > >
> > > >  And for your effort, you get an AVX2 classification path
> > > > that potentially doesn't actually do vectorized classification.
> > > >
> > > > It really seems better to me to not build the code if the compiler doesn't
> > > > support the instruction set it was meant to enable, and change the
> > > > classification function pointer to something that informs the user of the lack
> > > > of support at run time.
> > > >
> > > > > but for old compilers that don't support AVX2 -
> > > > > rte_acl_classify_avx2() would simply be identical to rte_acl_classify_sse().
> > > > >
> > > > That doesn't make sense to me, for two reasons:
> > > >
> > > > 1) What if the machine being targeted doesn't support sse either?
> > > >
> > >
> > > Exactly the same what is happening now on the machine with now SSE4.1 support.
> > > There is absolutely no difference here.
> > >
> > > > 2) If an application selects an AVX2 classifier, I as a developer expect to
> > > > either get AVX2 based classification, or an error indicating that I can't do
> > > > AVX2 classification, not a silent performance degradation down to scalar
> > > > classification.
> > >
> > > In fact I was considering both variants for compilers not supporting AVX2:
> > > 1. silently degrade to SSE method.
> > > 2. create  a dummy function rte_acl_classify_error() and put it  into classify_fns[RTE_ACL_CLASSIFY_AVX2].
> > >
> > > I choose #1 because it seems like a less distraction for the user -
> > > all would keep working as before, user just wouldn't see any improvement comparing to SSE method.
> > > Again didn't want to spread "ifdef __AVX2__" into rte_acl.c
> > > Though I don't have any strong opinion here.
> > > So if you can provide some good reason why #2 is preferable, I am ok to switch to #2.
> > >
> > Because 2 doesn't require any ifdeffing.  As you note above the problem here is
> > that AVX2 support is both compiler and machine dependent.  If you make a weak
> > symbol version of rte_acl_classify_avx2 that always gets built, then you've
> > reduced the problem to just being compiler support, which you can check in the
> > makefile.
> 
> I don't think we'll get rid of ifdefing with #2.
> We'll  remove 2 ifdefs in acl_run_avx2.h, but then we have to introduce 2 new in rte_acl.c instead.
> From my understanding, we we'll need something like that:
> 
> static const rte_acl_classify_t classify_fns[] = {
>         [RTE_ACL_CLASSIFY_DEFAULT] = rte_acl_classify_scalar,
>         [RTE_ACL_CLASSIFY_SCALAR] = rte_acl_classify_scalar,
>         [RTE_ACL_CLASSIFY_SSE] = rte_acl_classify_sse,
> +#if (defined __GNUC__ &&  __GNUC__ <= 4 && __GNUC_MINOR__ < 7)
> +      [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_error,
> +#else  
>       [RTE_ACL_CLASSIFY_AVX2] = rte_acl_classify_avx2,
> +#endif


You don't need to do this, you need to use a weak symbol:
static int rte_acl_classify_avx2(...) __attributes__(weak)
{
	return -EOPNOTSUP
}


Then in the rte_acl_avx2.c file define it again without the weak symbol

That way, you do conditional compilation, and when you do the "real" symbol
overrides the weak one.

> };
> 
> static void __attribute__((constructor))
> rte_acl_init(void)
> {
>         enum rte_acl_classify_alg alg = RTE_ACL_CLASSIFY_DEFAULT;
> 
> +#if (defined __GNUC__ &&  __GNUC__ <= 4 && __GNUC_MINOR__ < 7)
>         if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
>                 alg = RTE_ACL_CLASSIFY_AVX2;
>         else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> +#else
> +      if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
>                 alg = RTE_ACL_CLASSIFY_SSE;
> +#endif
>         rte_acl_set_default_classify(alg);
> }
> 
Why would you do this, this cpu feature flag definitions aren't matched to
compiler support, it should always be defined.  You should still be able to
check for AVX2 support in code that doesn't support emitting the instruction.

Neil

> Correct?
> Konstantin
> 
> > 
> > > >
> > > > > >  That in turn I think allows you to remove a
> > > > > > bunch of the ifdeffing that you've done in some of the avx2 specific files.
> > > > >
> > > > > Actually there are not many of them.
> > > > > One in acl_run_avx2.h and another in acl_run_avx2.c.
> > > > >
> > > > 2 in acl_run_avx2.h and 1 in rte_acl_osdep_alone.h, which is really 3 more than
> > > > you need if you just do an intellegent weak classifier function defintion.
> > >
> > > grep -n __AVX2__ lib/librte_acl/*.[c,h] | grep -v endif
> > > lib/librte_acl/acl_run_avx2.c:45:#ifdef __AVX2__
> > > lib/librte_acl/acl_run_avx2.h:36:#ifdef __AVX2__
> > >
> > > rte_acl_osdep_alone.h - is a different story.
> > > It needs to be there anyway, as in rte_common_vect.h.
> > > In fact  rte_acl_osdep_alone.h is only needed for cases when RTE_LIBRTE_ACL_STANDALONE=y.
> > > That comes from the old days, when we had to to support building librte_acl library without the rest of DPDK.
> > > I think we don't need it anymore and plan to remove it.
> > > Just thought it should  be in a separate patch.
> > > Konstantin
> > >
> > > >
> > > > Neil
> > >
> 

  reply	other threads:[~2014-12-17 20:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-14 18:10 [dpdk-dev] [PATCH 00/17] ACL: New AVX2 classify method and several other enhancements Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 01/17] app/test: few small fixes fot test_acl.c Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 02/17] librte_acl: make data_indexes long enough to survive idle transitions Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 03/17] librte_acl: remove build phase heuristsic with negative perfomance effect Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 04/17] librte_acl: fix a bug at build phase that can cause matches beeing overwirtten Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 05/17] librte_acl: introduce DFA nodes compression (group64) for identical entries Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 06/17] librte_acl: build/gen phase - simplify the way match nodes are allocated Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 07/17] librte_acl: make scalar RT code to be more similar to vector one Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 08/17] librte_acl: a bit of RT code deduplication Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 09/17] EAL: introduce rte_ymm and relatives in rte_common_vect.h Konstantin Ananyev
2014-12-15 15:56   ` Neil Horman
2014-12-14 18:10 ` [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method Konstantin Ananyev
2014-12-15 16:00   ` Neil Horman
2014-12-15 16:33     ` Ananyev, Konstantin
2014-12-15 20:20       ` Neil Horman
2014-12-16 16:16         ` Ananyev, Konstantin
2014-12-17 15:32           ` Neil Horman
2014-12-17 19:22             ` Ananyev, Konstantin
2014-12-17 20:27               ` Neil Horman [this message]
2014-12-18 15:01                 ` Ananyev, Konstantin
2015-01-06  9:57                   ` Ananyev, Konstantin
2015-01-06 12:40                     ` Neil Horman
2014-12-17  0:38         ` Ananyev, Konstantin
2014-12-14 18:10 ` [dpdk-dev] [PATCH 11/17] test-acl: add ability to manually select RT method Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 12/17] librte_acl: Remove search_sse_2 and relatives Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 13/17] libter_acl: move lo/hi dwords shuffle out from calc_addr Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 14/17] libte_acl: make calc_addr a define to deduplicate the code Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 15/17] libte_acl: introduce max_size into rte_acl_config Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 16/17] libte_acl: remove unused macros Konstantin Ananyev
2014-12-14 18:10 ` [dpdk-dev] [PATCH 17/17] libte_acl: fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y Konstantin Ananyev
2014-12-16 13:51   ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141217202743.GA10240@hmsreliant.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).