From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0060.outbound.protection.outlook.com [104.47.40.60]) by dpdk.org (Postfix) with ESMTP id C6C5E567E for ; Fri, 28 Apr 2017 12:19:22 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=1GpEJkYMhcUqTrNjURyJYULI71NjyfQZjzNRqR+yclk=; b=fD/dED7tI24zvexuSlQCBgS374ZhpOWBZD6AgtMSVyFEIf3ZIYW16zU4OD4ftvxVwiJuaxQ+oWV/fKesF8BOXZcdp0vvJBnuQadb2DAfSB+WmY3N2aE+C21oZ80U4g9XA9H+GCYUrq8AALJinWQE/FTr4g0rcpLftxF1SgVLufU= Received: from BY2PR07MB2421.namprd07.prod.outlook.com (10.166.115.13) by CY1PR0701MB1727.namprd07.prod.outlook.com (10.163.21.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.13; Fri, 28 Apr 2017 10:19:20 +0000 Received: from BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) by BY2PR07MB2421.namprd07.prod.outlook.com ([10.166.115.13]) with mapi id 15.01.1061.016; Fri, 28 Apr 2017 10:19:20 +0000 From: "Sekhar, Ashwin" To: Jan Viktorin CC: "thomas@monjalon.net" , "jasvinder.singh@intel.com" , "Jacob, Jerin" , "jianbo.liu@linaro.org" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 1/2] net: add arm64 neon version of CRC compute APIs Thread-Index: AQHSv2AGSaz4GnbOcUGH4m59Pa+ODQ== Date: Fri, 28 Apr 2017 10:19:20 +0000 Message-ID: References: <20170427141021.18767-1-ashwin.sekhar@caviumnetworks.com> <20170428115544.52d3388d@jvn> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: rehivetech.com; dkim=none (message not signed) header.d=none;rehivetech.com; dmarc=none action=none header.from=cavium.com; x-originating-ip: [111.93.218.67] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0701MB1727; 7:SVxckFR0qwxnrGy7PamO2b+OWtwTmX9pD8G83wQZLk+WNB3JHcTTLyyiNIBrJl2rNjD8qSMIknBKUtuutM4ooyHMLKiJnEy4sArKDzd41lkHAeTUOHAFUb0Xk2szpMZgZggZP7Wtr2R5mB72isjaVLSCa4Qb7kXxDeq7ZHs1iiJIFOVT3XUqPo1Xj9/6rxpEk1mAF/bXIEZnslNBb1bz7shGC4BDyi/RHw5WMHoIYWeHSU88grUlzXhkRrNzUihavS6sTZPEng7TapTj39OtO1k6uFWy3ziBGdqYIU3AwJ7t2uFHXVt4YlURChTCG9ZjnopPE6d6lBrvo0NWM3625w== x-forefront-antispam-report: SFV:SKI; SCL:-1SFV:NSPM; SFS:(10009020)(6009001)(39450400003)(39410400002)(39400400002)(39840400002)(51914003)(377454003)(24454002)(2906002)(66066001)(3280700002)(33656002)(3660700001)(189998001)(110136004)(38730400002)(4326008)(6246003)(54356999)(25786009)(102836003)(6116002)(3846002)(54906002)(50986999)(76176999)(99286003)(229853002)(55016002)(9686003)(53936002)(6306002)(6436002)(6916009)(5660300001)(8676002)(6506006)(77096006)(7696004)(8936002)(81166006)(2900100001)(86362001)(7736002)(122556002)(305945005)(74316002); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR0701MB1727; H:BY2PR07MB2421.namprd07.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; x-ms-office365-filtering-correlation-id: 73c94619-8f6f-4d58-60dc-08d48e2008bc x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(201703131423075)(201703031133081)(201702281549075); SRVR:CY1PR0701MB1727; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(22074186197030)(183786458502308); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6041248)(201703131423075)(201702281528075)(201703061421075)(20161123555025)(20161123560025)(20161123564025)(20161123562025)(6072148); SRVR:CY1PR0701MB1727; BCL:0; PCL:0; RULEID:; SRVR:CY1PR0701MB1727; x-forefront-prvs: 029174C036 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Apr 2017 10:19:20.1180 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0701MB1727 Subject: Re: [dpdk-dev] [PATCH 1/2] net: add arm64 neon version of CRC compute APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 10:19:23 -0000 Hi Jan,=0A= Thanks for the comments. Please see my responses inline.=0A= =0A= On Friday 28 April 2017 03:25 PM, Jan Viktorin wrote:=0A= > Hello Ashwin Sekhar,=0A= >=0A= > some comments below...=0A= >=0A= > On Thu, 27 Apr 2017 07:10:20 -0700=0A= > Ashwin Sekhar T K wrote:=0A= >=0A= >> * Added CRC compute APIs for arm64 utilizing the pmull capability=0A= >> * Added new file net_crc_neon.h to hold the arm64 pmull CRC=0A= >> implementation=0A= >> * Added crypto capability in compilation of generic armv8 and=0A= >> thunderx targets=0A= >> * pmull CRC version is used only after checking the pmull capability=0A= >> at runtime=0A= >> * Verified the changes with crc_autotest unit test case=0A= >>=0A= >> Signed-off-by: Ashwin Sekhar T K =0A= >> ---=0A= >> MAINTAINERS | 1 +=0A= >> lib/librte_eal/common/include/arch/arm/rte_vect.h | 45 +++=0A= >> lib/librte_net/net_crc_neon.h | 357 +++++++++++++++= +++++++=0A= >> lib/librte_net/rte_net_crc.c | 32 +-=0A= >> lib/librte_net/rte_net_crc.h | 2 +=0A= >> mk/machine/armv8a/rte.vars.mk | 2 +-=0A= >> mk/machine/thunderx/rte.vars.mk | 2 +-=0A= >> mk/rte.cpuflags.mk | 3 +=0A= >> mk/toolchain/gcc/rte.toolchain-compat.mk | 1 +=0A= >> 9 files changed, 438 insertions(+), 7 deletions(-)=0A= >> create mode 100644 lib/librte_net/net_crc_neon.h=0A= >>=0A= >> diff --git a/MAINTAINERS b/MAINTAINERS=0A= >> index 576d60a..283743e 100644=0A= >> --- a/MAINTAINERS=0A= >> +++ b/MAINTAINERS=0A= >> @@ -149,6 +149,7 @@ F: lib/librte_lpm/rte_lpm_neon.h=0A= >> F: lib/librte_hash/rte*_arm64.h=0A= >> F: lib/librte_efd/rte*_arm64.h=0A= >> F: lib/librte_table/rte*_arm64.h=0A= >> +F: lib/librte_net/net_crc_neon.h=0A= >> F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c=0A= >> F: drivers/net/i40e/i40e_rxtx_vec_neon.c=0A= >> F: drivers/net/virtio/virtio_rxtx_simple_neon.c=0A= >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/lib= rte_eal/common/include/arch/arm/rte_vect.h=0A= >> index 4107c99..9a3dfdf 100644=0A= >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h=0A= >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h=0A= >> @@ -34,9 +34,18 @@=0A= >> #define _RTE_VECT_ARM_H_=0A= >>=0A= >> #include =0A= >> +#include =0A= >> +=0A= >> #include "generic/rte_vect.h"=0A= >> #include "arm_neon.h"=0A= >>=0A= >> +#ifdef GCC_VERSION=0A= >> +#undef GCC_VERSION=0A= >> +#endif=0A= >=0A= > Why are you doing this? What is wrong with GCC_VERSION?=0A= >=0A= This is just to avoid multiple definitions of GCC_VERSION. Not required =0A= really. Can be removed.=0A= =0A= >> +=0A= >> +#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 \=0A= >> + + __GNUC_PATCHLEVEL__)=0A= >> +=0A= >=0A= > If you have any specific requirements for testing GCC version then it=0A= > should be done in a more elegant way. However, I do not understand your= =0A= > intention.=0A= >=0A= GCC version is checked so as to define wrappers for some neon intrinsics = =0A= which are not available in GCC versions < 7.=0A= =0A= Similar checks of GCC_VERSION done in ./lib/librte_table/rte_lru.h. =0A= Followed the same template here.=0A= Also, this is the suggested approach by GCC. Please see below link.=0A= https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html=0A= =0A= Please advise on more elegant ways of gcc version detection.=0A= >> #ifdef __cplusplus=0A= >> extern "C" {=0A= >> #endif=0A= >> @@ -78,6 +87,42 @@ vqtbl1q_u8(uint8x16_t a, uint8x16_t b)=0A= >> }=0A= >> #endif=0A= >>=0A= >> +#if (GCC_VERSION < 70000)=0A= >=0A= > Is this code is gcc-specific? In such case there should be check for=0A= > GCC compiler. We can also build e.g. by clang.=0A= >=0A= Yes, the code is GCC specific. Currently there are only GCC targets for = =0A= arm and arm64. So no checks are done for other types of compilers.=0A= >> +/*=0A= >> + * NEON intrinsic vreinterpretq_u64_p128() is not supported=0A= >> + * in GCC versions < 7=0A= >> + */=0A= >=0A= > I'd be positive about those comments, like:=0A= >=0A= > NEON intrinsic vreinterpretq_u64_p128() is supported since GCC 7.=0A= >=0A= Thanks. Will make the comments positive.=0A= =0A= >> +static inline uint64x2_t=0A= >> +vreinterpretq_u64_p128(poly128_t x)=0A= >> +{=0A= >> + return (uint64x2_t)x;=0A= >> +}=0A= >> +=0A= >> +/*=0A= >> + * NEON intrinsic vreinterpretq_p64_u64() is not supported=0A= >> + * in GCC versions < 7=0A= >> + */=0A= >> +static inline poly64x2_t=0A= >> +vreinterpretq_p64_u64(uint64x2_t x)=0A= >> +{=0A= >> + return (poly64x2_t)x;=0A= >> +}=0A= >> +=0A= >> +/*=0A= >> + * NEON intrinsic vgetq_lane_p64() is not supported=0A= >> + * in GCC versions < 7=0A= >> + */=0A= >> +static inline poly64_t=0A= >> +vgetq_lane_p64(poly64x2_t x, const int lane)=0A= >> +{=0A= >> + assert(lane >=3D 0 && lane <=3D 1);=0A= >> +=0A= >> + poly64_t *p =3D (poly64_t *)&x;=0A= >> +=0A= >> + return p[lane];=0A= >> +}=0A= >> +#endif=0A= >> +=0A= >> #ifdef __cplusplus=0A= >> }=0A= >> #endif=0A= >> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon= .h=0A= >=0A= > [...]=0A= >=0A= >> # CPU_LDFLAGS =3D=0A= >> # CPU_ASFLAGS =3D=0A= >>=0A= >> -MACHINE_CFLAGS +=3D -march=3Darmv8-a+crc=0A= >> +MACHINE_CFLAGS +=3D -march=3Darmv8-a+crc+crypto=0A= >> diff --git a/mk/machine/thunderx/rte.vars.mk b/mk/machine/thunderx/rte.v= ars.mk=0A= >> index ad5a379..6784105 100644=0A= >> --- a/mk/machine/thunderx/rte.vars.mk=0A= >> +++ b/mk/machine/thunderx/rte.vars.mk=0A= >> @@ -55,4 +55,4 @@=0A= >> # CPU_LDFLAGS =3D=0A= >> # CPU_ASFLAGS =3D=0A= >>=0A= >> -MACHINE_CFLAGS +=3D -march=3Darmv8-a+crc -mcpu=3Dthunderx=0A= >> +MACHINE_CFLAGS +=3D -march=3Darmv8-a+crc+crypto -mcpu=3Dthunderx=0A= >> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk=0A= >> index e634abc..6bbd742 100644=0A= >> --- a/mk/rte.cpuflags.mk=0A= >> +++ b/mk/rte.cpuflags.mk=0A= >> @@ -119,6 +119,9 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_FEATURE_CRC32= ),)=0A= >> CPUFLAGS +=3D CRC32=0A= >> endif=0A= >>=0A= >> +ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_FEATURE_CRYPTO),)=0A= >> +CPUFLAGS +=3D PMULL=0A= >> +endif=0A= >>=0A= >> MACHINE_CFLAGS +=3D $(addprefix -DRTE_MACHINE_CPUFLAG_,$(CPUFLAGS))=0A= >>=0A= >> diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc= /rte.toolchain-compat.mk=0A= >> index 280dde2..01ac7e2 100644=0A= >> --- a/mk/toolchain/gcc/rte.toolchain-compat.mk=0A= >> +++ b/mk/toolchain/gcc/rte.toolchain-compat.mk=0A= >> @@ -60,6 +60,7 @@ else=0A= >> #=0A= >> ifeq ($(shell test $(GCC_VERSION) -le 49 && echo 1), 1)=0A= >> MACHINE_CFLAGS :=3D $(patsubst -march=3Darmv8-a+crc,-march=3Darmv8-a+= crc -D__ARM_FEATURE_CRC32=3D1,$(MACHINE_CFLAGS))=0A= >=0A= > The line above is to be dropped, isn't it?=0A= >=0A= No. It is not to be dropped. For targets like xgene1, crypto is not =0A= defined. Above line is required for the substitution to happen in such =0A= targets.=0A= >> + MACHINE_CFLAGS :=3D $(patsubst -march=3Darmv8-a+crc+crypto,-march=3Da= rmv8-a+crc+crypto -D__ARM_FEATURE_CRC32=3D1,$(MACHINE_CFLAGS))=0A= >=0A= > Please, split the "feature-detection" changes into a separate commit and= =0A= > explain it. In the code, you test for GCC 7. Here you are ok with GCC=0A= > 4.9. It's likely to be correct but it is not clear.=0A= Sure. Will split the feature detection changes to separate commit.=0A= >=0A= > Also, please explain why is the "crypto" feature required.=0A= crypto feature is required for using the vmull_p64 intrinsic. More =0A= specifically the PMULL instruction.=0A= Will add this as part of the commit message.=0A= >=0A= > Regards=0A= > Jan=0A= >=0A= >> endif=0A= >> ifeq ($(shell test $(GCC_VERSION) -le 47 && echo 1), 1)=0A= >> MACHINE_CFLAGS :=3D $(patsubst -march=3Dcore-avx-i,-march=3Dcorei7-av= x,$(MACHINE_CFLAGS))=0A= >=0A= Thanks and Regards,=0A= Ashwin=0A= =0A=