From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 214D5A04BC; Fri, 9 Oct 2020 18:25:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F339B1D445; Fri, 9 Oct 2020 18:25:04 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id B8C111D42A for ; Fri, 9 Oct 2020 18:25:01 +0200 (CEST) IronPort-SDR: FO3BERlBpdK3xnONISlNgqERETIRUIqFMcCvzfzK5es06J+g2RE0sM/NiOMjpxLSdYi0yY0daE PqyuKkZhEY1g== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="144826401" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="144826401" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 09:24:58 -0700 IronPort-SDR: pdPw3RH4oIJBLGxaOeu2sR5B3UW6YJgekg6Y44m8XwEb7d6H5sBB2+R6lYr52O4Pm7T0QM617f Ojph2IEqK3lQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="389180950" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by orsmga001.jf.intel.com with ESMTP; 09 Oct 2020 09:24:58 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Fri, 9 Oct 2020 09:24:58 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Fri, 9 Oct 2020 09:24:58 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.106) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1713.5; Fri, 9 Oct 2020 09:24:56 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iw7xkumE1xS4SU6GjX7disj8XJ8xHoZrj7WwxHK6K6F+sGYqQIfq7eE332GEpTWCpu8CGA52BUYbc+NllbnUyz8xBgagZOrPNDIC3Qp/nrTZrB1xBntmPld5udrz1gSDY7mX1SwM/xNooSIAsPi377dC1+Q4POqIk7qWtTZ6MQZ7gfwYWPHNyKZqNOOz2+22NRdFQbsdzmsmFjTtzoTuPvpUDCUU2qUx5CgPats2KmOJQDtWPTisICCraBVSggHC8riT+omlkMCRsK/aBa1fHx40XZfvxVmyFtYtnTsZldiAVDXdDajWOoGfz9dePxGUwAiBXW6AWbwDblUhIvqHJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FdsvFvFKB0adDFSUX6npWIjGYW7/IgKM/6n6YytiQPA=; b=PmF0Nxhq/hic8V4YC2qmVl/Q2l7zghZdrzBWPZNM7tVNqqqlOYtlLDnnu+VALq8V4WTjBrEF698J1NvmPwPTOgDfLEDRH+4qcDP0WJLRqGUnTeBX1DQWB0Avw8hfkfKVGFa5Il+fEmqEjh2ARk5yeq02n/L3m0rSnCnA3ip0idghWAP+LHv8HMqQ2J31Lz/Iu3ELhnNSfaOmammbPmDEWhxjaAgSKDinyXZZWEP7Cv6Tr4tYcQ2mcQJrU/eb1LWZdr+Eeev3VdZwqjpPlQSWIU8cGPIhnvMmwJ0Ph+FxiZxks5+a7E2pcGNDzmgdjD4YUWvpPq4H1uCGSdELIUA/hw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FdsvFvFKB0adDFSUX6npWIjGYW7/IgKM/6n6YytiQPA=; b=WaLDNXbMvU4t0I+aymNd5xdUFpw1TD6SRaNS0FIq2v4aXlaPpfFoAqst1aNS+Hva2gw+2fP2Hh62E6JGRFmlFhOUDSE1WBnIpNT20Pqp0LE7M9vYSEzP8AM2Gq+iHNRg1pJZ3ZpV9/S2sQW8tEOK/fs2MO2Pfz/gUDV3LmZoQpI= Received: from BYAPR11MB3352.namprd11.prod.outlook.com (2603:10b6:a03:1d::26) by BYAPR11MB2678.namprd11.prod.outlook.com (2603:10b6:a02:c1::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3433.38; Fri, 9 Oct 2020 16:24:52 +0000 Received: from BYAPR11MB3352.namprd11.prod.outlook.com ([fe80::9d60:ddf8:226:8eb6]) by BYAPR11MB3352.namprd11.prod.outlook.com ([fe80::9d60:ddf8:226:8eb6%7]) with mapi id 15.20.3433.046; Fri, 9 Oct 2020 16:24:52 +0000 From: "Singh, Jasvinder" To: "O'loingsigh, Mairtin" , "Richardson, Bruce" , "De Lara Guarch, Pablo" , "Ananyev, Konstantin" CC: "dev@dpdk.org" , "Ryan, Brendan" , "Coyle, David" Thread-Topic: [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Thread-Index: AQHWnkNx+B/82oogK0mbs7Q8Yg9baKmPdLoA Date: Fri, 9 Oct 2020 16:24:52 +0000 Message-ID: References: <20201006162319.7981-1-mairtin.oloingsigh@intel.com> <20201009135045.8505-1-mairtin.oloingsigh@intel.com> <20201009135045.8505-3-mairtin.oloingsigh@intel.com> In-Reply-To: <20201009135045.8505-3-mairtin.oloingsigh@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.5.1.3 dlp-reaction: no-action dlp-product: dlpe-windows authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [51.37.138.153] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 239f271d-12e3-4380-7aee-08d86c6fda09 x-ms-traffictypediagnostic: BYAPR11MB2678: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4303; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: k0ki1xPDPulj7js2JgBSX77fpHJltMfONeFCTXyKM8aISHvgIpZm1+sVm0yNpkALNZTK2UvP2c4PlQ7RMrWlaSTdP/S3ntmuYwTbR7HO3ZW0OC1YEiMPqlMzPdDHOKombCbt8CydDqZDMDq3zJkcgCMYNQL+KwIzXDq06bDVwGIrs0yoNZOK2qbUriNu2jqiTyd/EdE2CSMF0HNOKw3Sb+rlxSMtmSzrbwYE7CD2DsbLNLh1rR7U2ZY7KNGR9VDNjQ1oWkWKTOcb3m1mn/z7aTl3P4D5a6LlLUXp8QrHBmRm2ptFeE3RHAwV5gZhHZQF3TUWjwJkkjlcgBzSbGhYvQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3352.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(39860400002)(366004)(346002)(136003)(396003)(66476007)(4326008)(478600001)(83380400001)(52536014)(26005)(71200400001)(64756008)(53546011)(8936002)(66446008)(7696005)(66946007)(86362001)(8676002)(5660300002)(2906002)(6506007)(55016002)(76116006)(66556008)(6636002)(186003)(9686003)(33656002)(30864003)(107886003)(54906003)(110136005)(316002)(579004); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: Y/IL2KXNcK2U3DKGGDXPeQK6Ral/D5IVO40thgOD7tdM0KZsCgzQ8cDkIY/51LhFqc18lJjPcZbdkVuJ4YlZYjo4kscW+Uwn0iDP40c3C6f6KaQaDy4rCPxoUiyt0LruD125mJ9Fbiexy71GojLFj0iILK6z+pYoGsRgKSn+ZRcEJ0pRkbkxxsB+QOTUdVqu5uogXSUPOLYuBPZHZuuK1h2r0IbzttkiwqVarcWcnGALoKtwOzP7MYYNZNLyRycYtxpVE70VqouC9IsU8lLMhS1D2C9CFPVjyq+oHfymzgBfsyQUVc7cJjX0npD2bl2WjNDhOcG5O/NUECem+yCKzcHHyGouYAPAX3oUf906vEIYAnZOwv2Hbq5fcfv+DxyC6van1GU53p43lChanLwF7l7OC7v5iaQfR7xgrvwiA54XvNu4fyyF/piIV5VYBjyIIrM3hN+v1CwJMXfs6TyxI8J6ndPqN/FPzjg8lzsoEWi7QYYq0UJQPHClpEAuo7CYNH2AsKjhqXhzxPEvaHJPenXegkRg3/dY4pzbX2H7Gw68Mth3dnWB6Bkl0/IIuv164yPuy+vJnzYrNZh/BGigdssoy98NSGO+ZDRpKBByEiwwWaRYMJ9sExFa+qKlhl30mLcnT6IU+GEjC39FsHoCtw== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3352.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 239f271d-12e3-4380-7aee-08d86c6fda09 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Oct 2020 16:24:52.6590 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: eYm9rZ1g6HIjkq4RkxhRHpCM5tnq8bJH/0JvBTaXgcphPfiKQtsbMYDhgat58BUWxhfeJiDZY2e+araLdqoZqMoxRgq/0Y939UXarD1unJ4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2678 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: O'loingsigh, Mairtin > Sent: Friday, October 9, 2020 2:51 PM > To: Singh, Jasvinder ; Richardson, Bruce > ; De Lara Guarch, Pablo > ; Ananyev, Konstantin > > Cc: dev@dpdk.org; Ryan, Brendan ; O'loingsigh, > Mairtin ; Coyle, David > > Subject: [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based > CRC >=20 > This patch enables the optimized calculation of CRC32-Ethernet and CRC16- > CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC > implementation is built if the compiler supports the required instruction= sets. > It is selected at run-time if the host CPU, again, supports the required > instruction sets. >=20 > Signed-off-by: Mairtin o Loingsigh > Signed-off-by: David Coyle > Acked-by: Konstantin Ananyev > --- > app/test/test_crc.c | 11 +- > config/x86/meson.build | 6 +- > doc/guides/rel_notes/release_20_11.rst | 2 + > lib/librte_net/meson.build | 55 +++++ > lib/librte_net/net_crc.h | 11 + > lib/librte_net/net_crc_avx512.c | 423 > +++++++++++++++++++++++++++++++++ > lib/librte_net/rte_net_crc.c | 46 ++++ > lib/librte_net/rte_net_crc.h | 4 +- > 8 files changed, 554 insertions(+), 4 deletions(-) create mode 100644 > lib/librte_net/net_crc_avx512.c >=20 > diff --git a/app/test/test_crc.c b/app/test/test_crc.c index > f8a74e04e..bf1d34435 100644 > --- a/app/test/test_crc.c > +++ b/app/test/test_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ >=20 > #include "test.h" > @@ -149,6 +149,15 @@ test_crc(void) > return ret; > } >=20 > + /* set CRC avx512 mode */ > + rte_net_crc_set_alg(RTE_NET_CRC_AVX512); > + > + ret =3D test_crc_calc(); > + if (ret < 0) { > + printf("test crc (x86_64 AVX512): failed (%d)\n", ret); > + return ret; > + } > + > /* set CRC neon mode */ > rte_net_crc_set_alg(RTE_NET_CRC_NEON); >=20 > diff --git a/config/x86/meson.build b/config/x86/meson.build index > fea4d5403..172b72b72 100644 > --- a/config/x86/meson.build > +++ b/config/x86/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel > Corporation > +# Copyright(c) 2017-2020 Intel Corporation >=20 > # get binutils version for the workaround of Bug 97 if not is_windows @= @ - > 23,7 +23,9 @@ endforeach >=20 > optional_flags =3D ['AES', 'PCLMUL', > 'AVX', 'AVX2', 'AVX512F', > - 'RDRND', 'RDSEED'] > + 'RDRND', 'RDSEED', > + 'AVX512BW', 'AVX512DQ', > + 'AVX512VL', 'VPCLMULQDQ'] > foreach f:optional_flags > if cc.get_define('__@0@__'.format(f), args: machine_args) =3D=3D '1' > if f =3D=3D 'PCLMUL' # special case flags with different defines diff > --git a/doc/guides/rel_notes/release_20_11.rst > b/doc/guides/rel_notes/release_20_11.rst > index b77297f7e..5eda680d5 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -58,6 +58,8 @@ New Features > * **Updated CRC modules of rte_net library.** >=20 > * Added run-time selection of the optimal architecture-specific CRC pa= th. > + * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT > + using the AVX512 and VPCLMULQDQ instruction sets. >=20 > * **Updated Broadcom bnxt driver.** >=20 > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build inde= x > fa439b9e5..6c96b361a 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -24,18 +24,62 @@ deps +=3D ['mbuf'] > if dpdk_conf.has('RTE_ARCH_X86_64') > net_crc_sse42_cpu_support =3D ( > cc.get_define('__PCLMUL__', args: machine_args) !=3D '') > + net_crc_avx512_cpu_support =3D ( > + cc.get_define('__AVX512F__', args: machine_args) !=3D '' and > + cc.get_define('__AVX512BW__', args: machine_args) !=3D '' and > + cc.get_define('__AVX512DQ__', args: machine_args) !=3D '' and > + cc.get_define('__AVX512VL__', args: machine_args) !=3D '' and > + cc.get_define('__VPCLMULQDQ__', args: machine_args) !=3D '') > + > net_crc_sse42_cc_support =3D ( > cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + net_crc_avx512_cc_support =3D ( > + not machine_args.contains('-mno-avx512f') and > + cc.has_argument('-mavx512f') and > + cc.has_argument('-mavx512bw') and > + cc.has_argument('-mavx512dq') and > + cc.has_argument('-mavx512vl') and > + cc.has_argument('-mvpclmulqdq') and > + cc.has_argument('-mavx2') and > + cc.has_argument('-mavx')) >=20 > build_static_net_crc_sse42_lib =3D 0 > + build_static_net_crc_avx512_lib =3D 0 >=20 > if net_crc_sse42_cpu_support =3D=3D true > sources +=3D files('net_crc_sse.c') > cflags +=3D ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + if net_crc_avx512_cpu_support =3D=3D true > + sources +=3D files('net_crc_avx512.c') > + cflags +=3D ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + elif net_crc_avx512_cc_support =3D=3D true > + build_static_net_crc_avx512_lib =3D 1 > + net_crc_avx512_lib_cflags =3D ['-mavx512f', > + '-mavx512bw', > + '-mavx512dq', > + '-mavx512vl', > + '-mvpclmulqdq', > + '-mavx2', > + '-mavx'] > + cflags +=3D ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + endif > elif net_crc_sse42_cc_support =3D=3D true > build_static_net_crc_sse42_lib =3D 1 > net_crc_sse42_lib_cflags =3D ['-mpclmul', '-maes'] > cflags +=3D ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + if net_crc_avx512_cc_support =3D=3D true > + build_static_net_crc_avx512_lib =3D 1 > + net_crc_avx512_lib_cflags =3D ['-mpclmul', > + '-maes', > + '-mavx512f', > + '-mavx512bw', > + '-mavx512dq', > + '-mavx512vl', > + '-mvpclmulqdq', > + '-mavx2', > + '-mavx'] > + cflags +=3D ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + endif > endif >=20 > if build_static_net_crc_sse42_lib =3D=3D 1 @@ -47,6 +91,17 @@ if > dpdk_conf.has('RTE_ARCH_X86_64') > net_crc_sse42_lib_cflags]) > objs +=3D net_crc_sse42_lib.extract_objects('net_crc_sse.c') > endif > + > + if build_static_net_crc_avx512_lib =3D=3D 1 > + net_crc_avx512_lib =3D static_library( > + 'net_crc_avx512_lib', > + 'net_crc_avx512.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_avx512_lib_cflags]) > + objs +=3D > net_crc_avx512_lib.extract_objects('net_crc_avx512.c') > + endif > + > elif (dpdk_conf.has('RTE_ARCH_ARM64') and > cc.get_define('__ARM_FEATURE_CRYPTO', args: > machine_args) !=3D '') > sources +=3D files('net_crc_neon.c') > diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index > a1578a56c..7a74d5406 100644 > --- a/lib/librte_net/net_crc.h > +++ b/lib/librte_net/net_crc.h > @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, > uint32_t data_len); uint32_t rte_crc32_eth_sse42_handler(const uint8_t > *data, uint32_t data_len); >=20 > +/* AVX512 */ > + > +void > +rte_net_crc_avx512_init(void); > + > +uint32_t > +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len); > + > /* NEON */ >=20 > void > diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx= 512.c > new file mode 100644 index 000000000..3740fe3c9 > --- /dev/null > +++ b/lib/librte_net/net_crc_avx512.c > @@ -0,0 +1,423 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#include > + > +#include > +#include > +#include > + > +#include "net_crc.h" > + > +#include > + > +/* VPCLMULQDQ CRC computation context structure */ struct > +crc_vpclmulqdq_ctx { > + __m512i rk1_rk2; > + __m512i rk3_rk4; > + __m512i fold_7x128b; > + __m512i fold_3x128b; > + __m128i rk5_rk6; > + __m128i rk7_rk8; > + __m128i fold_1x128b; > +}; > + > +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); static > +struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64); > + > +static uint16_t byte_len_to_mask_table[] =3D { > + 0x0000, 0x0001, 0x0003, 0x0007, > + 0x000f, 0x001f, 0x003f, 0x007f, > + 0x00ff, 0x01ff, 0x03ff, 0x07ff, > + 0x0fff, 0x1fff, 0x3fff, 0x7fff, > + 0xffff}; > + > +static const uint8_t shf_table[32] __rte_aligned(16) =3D { > + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, > + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, > + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, > + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f }; > + > +static const uint32_t mask[4] __rte_aligned(16) =3D { > + 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 }; > + > +static const uint32_t mask2[4] __rte_aligned(16) =3D { > + 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff }; > + > +static __rte_always_inline __m512i > +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i > fold) > +{ > + __m512i tmp0, tmp1; > + > + tmp0 =3D _mm512_clmulepi64_epi128(fold, precomp, 0x01); > + tmp1 =3D _mm512_clmulepi64_epi128(fold, precomp, 0x10); > + > + return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); } > + > +static __rte_always_inline __m128i > +crc32_fold_128(__m512i fold0, __m512i fold1, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i res, res2; > + __m256i a; > + __m512i tmp0, tmp1, tmp2, tmp3; > + __m512i tmp4; > + > + tmp0 =3D _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, > 0x01); > + tmp1 =3D _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, > 0x10); > + > + res =3D _mm512_extracti64x2_epi64(fold1, 3); > + tmp4 =3D _mm512_maskz_broadcast_i32x4(0xF, res); > + > + tmp2 =3D _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, > 0x01); > + tmp3 =3D _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, > 0x10); > + > + tmp0 =3D _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96); > + tmp0 =3D _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96); > + > + tmp1 =3D _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e); > + > + a =3D _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0); > + res =3D _mm256_extracti64x2_epi64(a, 1); > + res2 =3D _mm_xor_si128(res, *(__m128i *)&a); > + > + return res2; > +} > + > +static __rte_always_inline __m128i > +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i > res, > + const struct crc_vpclmulqdq_ctx *params) { > + uint32_t offset; > + __m128i res2, res3, res4, pshufb_shf; > + > + const uint32_t mask3[4] __rte_aligned(16) =3D { > + 0x80808080, 0x80808080, 0x80808080, 0x80808080 > + }; > + > + res2 =3D res; > + offset =3D data_len - n; > + res3 =3D _mm_loadu_si128((const __m128i *)&data[n+offset-16]); > + > + pshufb_shf =3D _mm_loadu_si128((const __m128i *) > + (shf_table + (data_len-n))); > + > + res =3D _mm_shuffle_epi8(res, pshufb_shf); > + pshufb_shf =3D _mm_xor_si128(pshufb_shf, > + _mm_load_si128((const __m128i *) mask3)); > + res2 =3D _mm_shuffle_epi8(res2, pshufb_shf); > + > + res2 =3D _mm_blendv_epi8(res2, res3, pshufb_shf); > + > + res4 =3D _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01); > + res =3D _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10); > + res =3D _mm_ternarylogic_epi64(res, res2, res4, 0x96); > + > + return res; > +} > + > +static __rte_always_inline __m128i > +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) { > + __m128i res1; > + > + res1 =3D res; > + > + res =3D _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0); > + res1 =3D _mm_srli_si128(res1, 8); > + res =3D _mm_xor_si128(res, res1); > + > + res1 =3D res; > + res =3D _mm_slli_si128(res, 4); > + res =3D _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10); > + res =3D _mm_xor_si128(res, res1); > + > + return res; > +} > + > +static __rte_always_inline uint32_t > +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx > +*params) { > + __m128i tmp0, tmp1; > + > + data64 =3D _mm_and_si128(data64, *(const __m128i *)mask2); > + tmp0 =3D data64; > + tmp1 =3D data64; > + > + data64 =3D _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0); > + data64 =3D _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i > *)mask, > + 0x28); > + > + tmp1 =3D data64; > + data64 =3D _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10); > + data64 =3D _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96); > + > + return _mm_extract_epi32(data64, 2); > +} > + > +static __rte_always_inline void > +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n= , > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i tmp, tmp1; > + > + tmp =3D _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1); > + *fold =3D _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10); > + *fold =3D _mm_xor_si128(*fold, tmp); > + tmp1 =3D _mm_loadu_si128((const __m128i *)&data[*n]); > + *fold =3D _mm_xor_si128(*fold, tmp1); > + *n +=3D 16; > + *len -=3D 16; > +} > + > +static __rte_always_inline uint32_t > +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32= _t > crc, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i res, d, b; > + __m512i temp, k; > + __m512i qw0 =3D _mm512_set1_epi64(0), qw1, qw2, qw3; > + __m512i fold0, fold1, fold2, fold3; > + __mmask16 mask; > + uint32_t n =3D 0; > + int reduction =3D 0; > + > + /* Get CRC init value */ > + b =3D _mm_cvtsi32_si128(crc); > + temp =3D _mm512_castsi128_si512(b); > + > + if (data_len > 255) { > + fold0 =3D _mm512_loadu_si512((const __m512i *)data); > + fold1 =3D _mm512_loadu_si512((const __m512i *)(data+64)); > + fold2 =3D _mm512_loadu_si512((const __m512i *)(data+128)); > + fold3 =3D _mm512_loadu_si512((const __m512i *)(data+192)); > + fold0 =3D _mm512_xor_si512(fold0, temp); > + > + /* Main folding loop */ > + k =3D params->rk1_rk2; > + for (n =3D 256; (n + 256) <=3D data_len; n +=3D 256) { > + qw0 =3D _mm512_loadu_si512((const __m512i > *)&data[n]); > + qw1 =3D _mm512_loadu_si512((const __m512i *) > + &(data[n+64])); > + qw2 =3D _mm512_loadu_si512((const __m512i *) > + &(data[n+128])); > + qw3 =3D _mm512_loadu_si512((const __m512i *) > + &(data[n+192])); > + fold0 =3D crcr32_folding_round(qw0, k, fold0); > + fold1 =3D crcr32_folding_round(qw1, k, fold1); > + fold2 =3D crcr32_folding_round(qw2, k, fold2); > + fold3 =3D crcr32_folding_round(qw3, k, fold3); > + } > + > + /* 256 to 128 fold */ > + k =3D params->rk3_rk4; > + fold0 =3D crcr32_folding_round(fold2, k, fold0); > + fold1 =3D crcr32_folding_round(fold3, k, fold1); > + > + res =3D crc32_fold_128(fold0, fold1, params); > + > + reduction =3D 240 - ((n+256)-data_len); > + > + while (reduction > 0) > + reduction_loop(&res, &reduction, data, &n, > + params); > + > + reduction +=3D 16; > + > + if (n !=3D data_len) > + res =3D last_two_xmm(data, data_len, n, res, > + params); > + } else { > + if (data_len > 31) { > + res =3D _mm_cvtsi32_si128(crc); > + d =3D _mm_loadu_si128((const __m128i *)data); > + res =3D _mm_xor_si128(res, d); > + n +=3D 16; > + > + reduction =3D 240 - ((n+256)-data_len); > + > + while (reduction > 0) > + reduction_loop(&res, &reduction, data, &n, > + params); > + > + if (n !=3D data_len) > + res =3D last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len > 16) { > + res =3D _mm_cvtsi32_si128(crc); > + d =3D _mm_loadu_si128((const __m128i *)data); > + res =3D _mm_xor_si128(res, d); > + n +=3D 16; > + > + if (n !=3D data_len) > + res =3D last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len =3D=3D 16) { > + res =3D _mm_cvtsi32_si128(crc); > + d =3D _mm_loadu_si128((const __m128i *)data); > + res =3D _mm_xor_si128(res, d); > + } else { > + res =3D _mm_cvtsi32_si128(crc); > + mask =3D byte_len_to_mask_table[data_len]; > + d =3D _mm_maskz_loadu_epi8(mask, data); > + res =3D _mm_xor_si128(res, d); > + > + if (data_len > 3) { > + d =3D _mm_loadu_si128((const __m128i *) > + &shf_table[data_len]); > + res =3D _mm_shuffle_epi8(res, d); > + } else if (data_len > 2) { > + res =3D _mm_slli_si128(res, 5); > + goto do_barrett_reduction; > + } else if (data_len > 1) { > + res =3D _mm_slli_si128(res, 6); > + goto do_barrett_reduction; > + } else if (data_len > 0) { > + res =3D _mm_slli_si128(res, 7); > + goto do_barrett_reduction; > + } else { > + /* zero length case */ > + return crc; > + } > + } > + } > + > + res =3D done_128(res, params); > + > +do_barrett_reduction: > + n =3D barrett_reduction(res, params); > + > + return n; > +} > + > +static void > +crc32_load_init_constants(void) > +{ > + __m128i a; > + /* fold constants */ > + uint64_t c0 =3D 0x00000000e95c1271; > + uint64_t c1 =3D 0x00000000ce3371cb; > + uint64_t c2 =3D 0x00000000910eeec1; > + uint64_t c3 =3D 0x0000000033fff533; > + uint64_t c4 =3D 0x000000000cbec0ed; > + uint64_t c5 =3D 0x0000000031f8303f; > + uint64_t c6 =3D 0x0000000057c54819; > + uint64_t c7 =3D 0x00000000df068dc2; > + uint64_t c8 =3D 0x00000000ae0b5394; > + uint64_t c9 =3D 0x000000001c279815; > + uint64_t c10 =3D 0x000000001d9513d7; > + uint64_t c11 =3D 0x000000008f352d95; > + uint64_t c12 =3D 0x00000000af449247; > + uint64_t c13 =3D 0x000000003db1ecdc; > + uint64_t c14 =3D 0x0000000081256527; > + uint64_t c15 =3D 0x00000000f1da05aa; > + uint64_t c16 =3D 0x00000000ccaa009e; > + uint64_t c17 =3D 0x00000000ae689191; > + uint64_t c18 =3D 0x00000000ccaa009e; > + uint64_t c19 =3D 0x00000000b8bc6765; > + uint64_t c20 =3D 0x00000001f7011640; > + uint64_t c21 =3D 0x00000001db710640; > + > + a =3D _mm_set_epi64x(c1, c0); > + crc32_eth.rk1_rk2 =3D _mm512_broadcast_i32x4(a); > + > + a =3D _mm_set_epi64x(c3, c2); > + crc32_eth.rk3_rk4 =3D _mm512_broadcast_i32x4(a); > + > + crc32_eth.fold_7x128b =3D _mm512_setr_epi64(c4, c5, c6, c7, c8, > + c9, c10, c11); > + crc32_eth.fold_3x128b =3D _mm512_setr_epi64(c12, c13, c14, c15, > + c16, c17, 0, 0); > + crc32_eth.fold_1x128b =3D _mm_setr_epi64(_mm_cvtsi64_m64(c16), > + _mm_cvtsi64_m64(c17)); > + > + crc32_eth.rk5_rk6 =3D _mm_setr_epi64(_mm_cvtsi64_m64(c18), > + _mm_cvtsi64_m64(c19)); > + crc32_eth.rk7_rk8 =3D _mm_setr_epi64(_mm_cvtsi64_m64(c20), > + _mm_cvtsi64_m64(c21)); > +} > + > +static void > +crc16_load_init_constants(void) > +{ > + __m128i a; > + /* fold constants */ > + uint64_t c0 =3D 0x0000000000009a19; > + uint64_t c1 =3D 0x0000000000002df8; > + uint64_t c2 =3D 0x00000000000068af; > + uint64_t c3 =3D 0x000000000000b6c9; > + uint64_t c4 =3D 0x000000000000c64f; > + uint64_t c5 =3D 0x000000000000cd95; > + uint64_t c6 =3D 0x000000000000d341; > + uint64_t c7 =3D 0x000000000000b8f2; > + uint64_t c8 =3D 0x0000000000000842; > + uint64_t c9 =3D 0x000000000000b072; > + uint64_t c10 =3D 0x00000000000047e3; > + uint64_t c11 =3D 0x000000000000922d; > + uint64_t c12 =3D 0x0000000000000e3a; > + uint64_t c13 =3D 0x0000000000004d7a; > + uint64_t c14 =3D 0x0000000000005b44; > + uint64_t c15 =3D 0x0000000000007762; > + uint64_t c16 =3D 0x00000000000081bf; > + uint64_t c17 =3D 0x0000000000008e10; > + uint64_t c18 =3D 0x00000000000081bf; > + uint64_t c19 =3D 0x0000000000001cbb; > + uint64_t c20 =3D 0x000000011c581910; > + uint64_t c21 =3D 0x0000000000010810; > + > + a =3D _mm_set_epi64x(c1, c0); > + crc16_ccitt.rk1_rk2 =3D _mm512_broadcast_i32x4(a); > + > + a =3D _mm_set_epi64x(c3, c2); > + crc16_ccitt.rk3_rk4 =3D _mm512_broadcast_i32x4(a); > + > + crc16_ccitt.fold_7x128b =3D _mm512_setr_epi64(c4, c5, c6, c7, c8, > + c9, c10, c11); > + crc16_ccitt.fold_3x128b =3D _mm512_setr_epi64(c12, c13, c14, c15, > + c16, c17, 0, 0); > + crc16_ccitt.fold_1x128b =3D _mm_setr_epi64(_mm_cvtsi64_m64(c16), > + _mm_cvtsi64_m64(c17)); > + > + crc16_ccitt.rk5_rk6 =3D _mm_setr_epi64(_mm_cvtsi64_m64(c18), > + _mm_cvtsi64_m64(c19)); > + crc16_ccitt.rk7_rk8 =3D _mm_setr_epi64(_mm_cvtsi64_m64(c20), > + _mm_cvtsi64_m64(c21)); > +} > + > +void > +rte_net_crc_avx512_init(void) > +{ > + crc32_load_init_constants(); > + crc16_load_init_constants(); > + > + /* > + * Reset the register as following calculation may > + * use other data types such as float, double, etc. > + */ > + _mm_empty(); > +} > + > +uint32_t > +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len) > +{ > + /* return 16-bit CRC value */ > + return (uint16_t)~crc32_eth_calc_vpclmulqdq(data, > + data_len, > + 0xffff, > + &crc16_ccitt); > +} > + > +uint32_t > +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) { > + /* return 32-bit CRC value */ > + return ~crc32_eth_calc_vpclmulqdq(data, > + data_len, > + 0xffffffffUL, > + &crc32_eth); > +} > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c = index > d271d5205..32a366590 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -37,6 +37,12 @@ static const rte_net_crc_handler handlers_scalar[] =3D= { > [RTE_NET_CRC16_CCITT] =3D rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] =3D rte_crc32_eth_handler, }; > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > +static const rte_net_crc_handler handlers_avx512[] =3D { > + [RTE_NET_CRC16_CCITT] =3D rte_crc16_ccitt_avx512_handler, > + [RTE_NET_CRC32_ETH] =3D rte_crc32_eth_avx512_handler, }; #endif > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const > rte_net_crc_handler handlers_sse42[] =3D { > [RTE_NET_CRC16_CCITT] =3D rte_crc16_ccitt_sse42_handler, @@ - > 134,6 +140,39 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t > data_len) > crc32_eth_lut); > } >=20 > +/* AVX512/VPCLMULQDQ handling */ > + > +#define AVX512_VPCLMULQDQ_CPU_SUPPORTED ( \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ) \ > +) > + > +static const rte_net_crc_handler * > +avx512_vpclmulqdq_get_handlers(void) > +{ > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) > + return handlers_avx512; > +#endif > + return NULL; > +} > + > +static uint8_t > +avx512_vpclmulqdq_init(void) > +{ > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { > + rte_net_crc_avx512_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > /* SSE4.2/PCLMULQDQ handling */ >=20 > #define SSE42_PCLMULQDQ_CPU_SUPPORTED \ @@ -196,6 +235,11 @@ > rte_net_crc_set_alg(enum rte_net_crc_alg alg) > handlers =3D NULL; >=20 > switch (alg) { > + case RTE_NET_CRC_AVX512: > + handlers =3D avx512_vpclmulqdq_get_handlers(); > + if (handlers !=3D NULL) > + break; > + /* fall-through */ > case RTE_NET_CRC_SSE42: > handlers =3D sse42_pclmulqdq_get_handlers(); > break; /* for x86, always break here */ @@ -235,6 +279,8 > @@ RTE_INIT(rte_net_crc_init) >=20 > if (sse42_pclmulqdq_init()) > alg =3D RTE_NET_CRC_SSE42; > + if (avx512_vpclmulqdq_init()) > + alg =3D RTE_NET_CRC_AVX512; > if (neon_pmull_init()) > alg =3D RTE_NET_CRC_NEON; >=20 > diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h = index > 16e85ca97..72d3e10ff 100644 > --- a/lib/librte_net/rte_net_crc.h > +++ b/lib/librte_net/rte_net_crc.h > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ >=20 > #ifndef _RTE_NET_CRC_H_ > @@ -23,6 +23,7 @@ enum rte_net_crc_alg { > RTE_NET_CRC_SCALAR =3D 0, > RTE_NET_CRC_SSE42, > RTE_NET_CRC_NEON, > + RTE_NET_CRC_AVX512, > }; >=20 > /** > @@ -35,6 +36,7 @@ enum rte_net_crc_alg { > * - RTE_NET_CRC_SCALAR > * - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic) > * - RTE_NET_CRC_NEON (Use ARM Neon intrinsic) > + * - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic) > */ > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg); > -- > 2.12.3 Reviewed-by: Jasvinder Singh