From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 7823C8045 for ; Mon, 15 Dec 2014 21:20:53 +0100 (CET) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1Y0c8P-00043J-3I; Mon, 15 Dec 2014 15:20:52 -0500 Date: Mon, 15 Dec 2014 15:20:43 -0500 From: Neil Horman To: "Ananyev, Konstantin" Message-ID: <20141215202043.GD3803@hmsreliant.think-freely.org> References: <1418580659-12595-1-git-send-email-konstantin.ananyev@intel.com> <1418580659-12595-11-git-send-email-konstantin.ananyev@intel.com> <20141215160009.GC3803@hmsreliant.think-freely.org> <2601191342CEEE43887BDE71AB977258213C0D9C@IRSMSX105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB977258213C0D9C@IRSMSX105.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Dec 2014 20:20:53 -0000 On Mon, Dec 15, 2014 at 04:33:47PM +0000, Ananyev, Konstantin wrote: > Hi Neil, > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Monday, December 15, 2014 4:00 PM > > To: Ananyev, Konstantin > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method > > > > On Sun, Dec 14, 2014 at 06:10:52PM +0000, Konstantin Ananyev wrote: > > > Introduce new classify() method that uses AVX2 instructions. > > > From my measurements: > > > On HSW boards when processing >= 16 packets per call, > > > AVX2 method outperforms it's SSE counterpart by 10-25%, > > > (depending on the ruleset). > > > At runtime, this method is selected as default one on HW that supports AVX2. > > > > > > Signed-off-by: Konstantin Ananyev > > > --- > > > lib/librte_acl/Makefile | 9 + > > > lib/librte_acl/acl.h | 4 + > > > lib/librte_acl/acl_run.h | 2 +- > > > lib/librte_acl/acl_run_avx2.c | 58 +++++ > > > lib/librte_acl/acl_run_avx2.h | 305 ++++++++++++++++++++++++ > > > lib/librte_acl/acl_run_sse.c | 537 +----------------------------------------- > > > lib/librte_acl/acl_run_sse.h | 533 +++++++++++++++++++++++++++++++++++++++++ > > > lib/librte_acl/rte_acl.c | 5 +- > > > lib/librte_acl/rte_acl.h | 2 + > > > 9 files changed, 917 insertions(+), 538 deletions(-) > > > create mode 100644 lib/librte_acl/acl_run_avx2.c > > > create mode 100644 lib/librte_acl/acl_run_avx2.h > > > create mode 100644 lib/librte_acl/acl_run_sse.h > > > > > > diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile > > > index 65e566d..223ec31 100644 > > > --- a/lib/librte_acl/Makefile > > > +++ b/lib/librte_acl/Makefile > > > @@ -45,8 +45,17 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c > > > SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c > > > SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c > > > SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c > > > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_avx2.c > > > > > > CFLAGS_acl_run_sse.o += -msse4.1 > > > +ifeq ($(CC), icc) > > > +CFLAGS_acl_run_avx2.o += -march=core-avx2 > > > +else ifneq ($(shell \ > > > +test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 6 && echo 1), 1) > > > +CFLAGS_acl_run_avx2.o += -mavx2 > > > +else > > > +CFLAGS_acl_run_avx2.o += -msse4.1 > > > +endif > > > > > This seems broken. You've unilaterally included acl_run_avx2.c in the build > > list above, but only enable -mavx2 if the compiler is at least gcc 4.6. > > Actually 4.7 (before that version, as I know, gcc doesn't support avx2) > > > Unless > > you want to make gcc 4.6 a requirement for building, > > I believe DPDK is required to be buildable by gcc 4.6 > As I remember, we have to support it all way down to gcc 4.3. > > > you need to also exclude > > the file above from the build list. > > That means that for gcc 4.6 and below rte_acl_classify_avx2() would not be defined. > And then at runtime, I have to check for that somehow and (re)populate classify_fns[]. > Doesn't seems like a good way to me. There are plenty of ways around that. At a minimum you could make the classify_fns array the one place that you need to add an ifdef __AVX__ call. You could also create a secondary definition of rte_acl_classify_avx2, and mark it as a weak symbol, which only returns -EOPNOTSUPP. That would be good, since the right thing will just automatically happen then if you don't build the actual avx2 classification code > Instead, I prefer to always build acl_run_avx2.c, But you can't do that. You just said above that you need to support down to gcc 4.3. I see you've worked around that with some additional ifdef __AVX__ instructions, but in so doing you ignore the possibiity that sse isn't supported, so you need to add __SSE__ checks now as well. ifdeffing that much just isn't scalable. And for your effort, you get an AVX2 classification path that potentially doesn't actually do vectorized classification. It really seems better to me to not build the code if the compiler doesn't support the instruction set it was meant to enable, and change the classification function pointer to something that informs the user of the lack of support at run time. > but for old compilers that don't support AVX2 - > rte_acl_classify_avx2() would simply be identical to rte_acl_classify_sse(). > That doesn't make sense to me, for two reasons: 1) What if the machine being targeted doesn't support sse either? 2) If an application selects an AVX2 classifier, I as a developer expect to either get AVX2 based classification, or an error indicating that I can't do AVX2 classification, not a silent performance degradation down to scalar classifcation. > > That in turn I think allows you to remove a > > bunch of the ifdeffing that you've done in some of the avx2 specific files. > > Actually there are not many of them. > One in acl_run_avx2.h and another in acl_run_avx2.c. > 2 in acl_run_avx2.h and 1 in rte_acl_osdep_alone.h, which is really 3 more than you need if you just do an intellegent weak classifier function defintion. Neil