From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C3092A0C43; Thu, 7 Oct 2021 20:23:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8322B411DB; Thu, 7 Oct 2021 20:23:56 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 60259410EB for ; Thu, 7 Oct 2021 20:23:54 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10130"; a="206451474" X-IronPort-AV: E=Sophos;i="5.85,355,1624345200"; d="scan'208";a="206451474" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 11:23:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,355,1624345200"; d="scan'208";a="484638089" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by fmsmga007.fm.intel.com with ESMTP; 07 Oct 2021 11:23:53 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Thu, 7 Oct 2021 11:23:52 -0700 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12 via Frontend Transport; Thu, 7 Oct 2021 11:23:52 -0700 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.101) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2242.12; Thu, 7 Oct 2021 11:23:52 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=komIg3n0hArt9+DWcchCkrgoFbcqh9bQVH5KtVkT2w8LS3pXBUXhbE4BWNDPz6Dq28/hE43NX2gM5477NQ/18mrL0mJVcKwr1H4Jjg3gqF3d9YjS7njcAe0FK3qwb4dKpKU6LSpItncunow2Kb/rucjL3OnD5vss6uAJdSHcGGGnvPmpwtC8jloACDp/t/rIamqjLzNTTsZGqijwVpZvyhcIl5Ny2X0kXfEKqghgSrHWJqynMuHCPWtCfA1TefEFYRsOW0HAYrbnXfDnrnYvCOgzENIm36SNxjobzbMV/ax+3WblmiwdvIuoqScjfAz5aiC693Az5NYRpEO2XfeDPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6yqpYzr2YlDPgauvB8SP+bmkt3/Y3jEA3Xvy563MUfE=; b=P38ZjseNVcUQJTQENNMqCAupeVDecqwTm/CQeC+8glAaiy9oUIoz3V0eOz4Q577ACOjaOVMGpqNHKsRa60xBrQ7jKWfNPE5AVrA1E7iQBY8LqEx2HZUaHqBnLjBl71WJZwMyYne7kRLuWkeYff+EO9JCkeWchnIXnGWhJ2IanZoytL2cDgWY4woM4Lb6XbQZOIzDaT8sM077LEJiuaC2Bfy1L8g6txDZEKOsYV7qbSUimfSnLw9W4C99v4AyUIeIxWDsE4eK1rcbiUlSoYXsV4NC/fohplT95Vsd6zS/fw9Og1cGg87ttQ3mL/p0NqswtwGpyFykV/9oiZksqSV4Ww== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6yqpYzr2YlDPgauvB8SP+bmkt3/Y3jEA3Xvy563MUfE=; b=gbCteEOUD5PuxDHumyEFr15MRWHvmYEu4/tIirdBoVRxtfdSMiKW5/jAfDEeH+6nAERNVXRbpN5Q/ch5yWnrVB01ssRt/jVoqy5xQa4qOXLZih/GC1xNdzXaSawkRWJksdDDVKBKs40Ye3p/ZQjUbmSMwe9UkuYtZxn6ibeZs+w= Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19) by DM5PR11MB1625.namprd11.prod.outlook.com (2603:10b6:4:b::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.20; Thu, 7 Oct 2021 18:23:49 +0000 Received: from DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::740e:126e:c785:c8fd]) by DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::740e:126e:c785:c8fd%4]) with mapi id 15.20.4587.019; Thu, 7 Oct 2021 18:23:49 +0000 From: "Ananyev, Konstantin" To: "Medvedkin, Vladimir" , "dev@dpdk.org" CC: "Chilikin, Andrey" , "Wang, Yipeng1" , "Gobriel, Sameh" , "Richardson, Bruce" , "Mcnamara, John" Thread-Topic: [PATCH 1/5] hash: add new toeplitz hash implementation Thread-Index: AQHXozjeXpSedpKkskK1iuEvx5T/hKvIAo3g Date: Thu, 7 Oct 2021 18:23:49 +0000 Message-ID: References: <1630944239-363648-1-git-send-email-vladimir.medvedkin@intel.com> <1630944239-363648-2-git-send-email-vladimir.medvedkin@intel.com> In-Reply-To: <1630944239-363648-2-git-send-email-vladimir.medvedkin@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.200.16 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 47289451-0ec3-461a-e322-08d989bf9bdd x-ms-traffictypediagnostic: DM5PR11MB1625: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:257; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: CdHzO1jECTf3sirHUiBiMRhEn6rmDi/BHAhrNoql8Ju1YqN/Mr+8+3oN9oGPSGyD/tw3gJ8XxZfEY1i/ZWZUUwKD9UIPEvGsab61BUTibtyqB1EotTsMIg46hd7ubodvUA1MR1m4985JbibLbbqWkadpiE/M/rTa0e/lpjZK875E1qsYFEF4vEA4GnVJNs4Ribu56nq1H3JsXCli6EtWgGvHeVAbrFIvLYgQDSQstm/bnZH2pDEBB0xa1uf9MyxfbBL+aKeu+CjG4xlUcQpMPUMS9duhcbVNtHuM29mpexKjlXNXRzLeP81YOxFLo33fummVuALrK5it8iGLrpW41/NnIO4w9mrNEID87jxI3Zcgd4Rb8PWgXJvpWn30lus/HdB2/FCD7HzDn84E/TMxpYiZzEatGS7Zt5zr6Gwjw1+FrbjCCFV/98eBeoz4AVZPXbhGIvTOj7K8KI35xFxhbVQFA74vMvF5v5KJtc9ADQAE82+RUFGoX5JZ88igu9X0HYwTrsHe9MFgEyH9Nrvp3KdIgCOeG1Ef8qwosZ7s86nvVqkkCAOy3y1J86Zz0dO8kaXjwBI2lg5eMaQ4SrTOoxt26sQD2liNIQ2yuZMlC9uKtrm7HAU5HJAAoTG/qM5/nYiVc2et6HbLRnXLiNluST40mq7PO4b4I3Uny0inwMSmJZfzggoVaI0qk47yUg1n78BNR1/xEngP4JVrbtp9kQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(30864003)(26005)(38070700005)(86362001)(122000001)(316002)(7696005)(6506007)(38100700002)(186003)(33656002)(71200400001)(55236004)(2906002)(66946007)(107886003)(52536014)(5660300002)(64756008)(66476007)(66556008)(9686003)(83380400001)(55016002)(4326008)(76116006)(110136005)(54906003)(8676002)(66446008)(508600001)(8936002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?fVd4ZL3qSEN85OaxrxrQ4dzzVddoeLiLSpF6Z4MnBK/VLmV0H/l9ou7Ovgwt?= =?us-ascii?Q?1V1CrsKDg2LXmUBgeA/7tXDvcMojel3OugbsboPo8j85OlZhLnKrDm3Abk7+?= =?us-ascii?Q?VQ1CResG0uASyWqDORuW/2uev8xyjbw5gCOeELSshCJl7qy58oZ1SGQv2Wfr?= =?us-ascii?Q?Cz8o6ULaI8ch0CaPQQG3H3CnhFTHYQb3nxJ9efevu7RMXDtYRAVwjo4Ccy4G?= =?us-ascii?Q?XpBi7uDjvlTP31Qnkbaxq5Q8uLIoyG+og1uAnEQ8cFahnr6I0RmI51lNrmZ4?= =?us-ascii?Q?U6sWBYvJEM2/qgCjM0tDU/SIZvPeRAolHZzKe+3KTKAL1wlJzNKOrLJiU3l/?= =?us-ascii?Q?19/Q3ZgHPu2GY224hquUWVBOA4C50LJ5XpKfnDTsZtTw17ifB8iM3t+XpD3F?= =?us-ascii?Q?hDnVPQxTJNciZzfwvdfnOY5qO5LedlOhQI5pDGGlPUwltdIpz9ifQ7PPT5ju?= =?us-ascii?Q?tAxcAxOJ0aaj+VHeARIJV//hs5gAqEtUnrACXEJbDDFgXjuG26hgCBLgFy/I?= =?us-ascii?Q?3v8TwbIvmDQy4TMA9t0Rp8If3k1e0Udwojqasc+leR75pVJvjkt7NAlDZcMw?= =?us-ascii?Q?GvS7F44Tw7NgKHe5wlmmdkLHhQqPrm/DOlwTuaOu5/SyHJyp+/XRmacd6BJ5?= =?us-ascii?Q?A/3lM0g/aZafHx5H5BfCkbqERNxqV02XfCyJclcVl6doIfLcdmNNU6iHEncK?= =?us-ascii?Q?cDMGr8SmnaxjIcGGizZf1mZO/T6YNO39jomDH9TLGYf2aoEfu0j7A+9R8DEG?= =?us-ascii?Q?3oczMOpaVCFfLzBlrK/UpyJ2KcY0FSdN8OA8sstm3XRElBAehR+B43uA5nqb?= =?us-ascii?Q?S19zfFQN03HBc9+yR4LIaTzPVc9rd3AZgN5K2O7G1pUCnD1z9bv11Lo4+MiL?= =?us-ascii?Q?VrS3UZQf/FwT7tDWFPDYZQ8qi0eZa1YG5rZOY7e+P9UI0QwRln+NrDgNG98k?= =?us-ascii?Q?TTOfoecBpezDyVRDOFwKxdMuKhhXOOgZmJhXfYGI45luxAws9p/MG7bO8kZJ?= =?us-ascii?Q?kVTBbTIesb6HTivdVhYJYj9wirfItfg2zGXXAB/dUicul8X8rHNKcIiJo/v6?= =?us-ascii?Q?sh1mY5GbBay73BpaEIc+gmiB43xBafKN61yFzhPeHasQ/ZFB3kW23afGR9Sl?= =?us-ascii?Q?WIDzvvPMAGtJgn6uwyJ2BRrCkDXu4+okNBUbnMBK3PbgY5W9rGEuKZmN3wfD?= =?us-ascii?Q?r6MlruEf0ibs/DOuzWE/yzGq/eGs/4SDhvxRke1TPMoHu7UrQNch3cHorGqY?= =?us-ascii?Q?hqVnhibldyXOeioEs1HtV+9xw4Pn4LgGV3zsMgyDXfDK3YRu3wOU6jvZsh2O?= =?us-ascii?Q?2B9PHBoTEV1t+cteasDnQcDZ?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 47289451-0ec3-461a-e322-08d989bf9bdd X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Oct 2021 18:23:49.5335 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: sD6PDEI22O/VwYA/Fl8TyAWMq5K2oVCMrEKUvfmL3Gux2QHVWi9IwRFaSb5eRofISd5i0S8cSHYwhaqgTp2te+jjdgR+sDUxN6QT1/FBv3I= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR11MB1625 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH 1/5] hash: add new toeplitz hash implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > This patch add a new Toeplitz hash implementation using > Galios Fields New Instructions (GFNI). >=20 > Signed-off-by: Vladimir Medvedkin > --- > doc/api/doxy-api-index.md | 1 + > lib/hash/meson.build | 1 + > lib/hash/rte_thash.c | 26 ++++++ > lib/hash/rte_thash.h | 22 +++++ > lib/hash/rte_thash_gfni.h | 229 ++++++++++++++++++++++++++++++++++++++++= ++++++ > lib/hash/version.map | 2 + > 6 files changed, 281 insertions(+) > create mode 100644 lib/hash/rte_thash_gfni.h >=20 > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md > index 1992107..7549477 100644 > --- a/doc/api/doxy-api-index.md > +++ b/doc/api/doxy-api-index.md > @@ -139,6 +139,7 @@ The public API headers are grouped by topics: > [hash] (@ref rte_hash.h), > [jhash] (@ref rte_jhash.h), > [thash] (@ref rte_thash.h), > + [thash_gfni] (@ref rte_thash_gfni.h), > [FBK hash] (@ref rte_fbk_hash.h), > [CRC hash] (@ref rte_hash_crc.h) >=20 > diff --git a/lib/hash/meson.build b/lib/hash/meson.build > index 9bc5ef9..40444ac 100644 > --- a/lib/hash/meson.build > +++ b/lib/hash/meson.build > @@ -7,6 +7,7 @@ headers =3D files( > 'rte_hash.h', > 'rte_jhash.h', > 'rte_thash.h', > + 'rte_thash_gfni.h', > ) > indirect_headers +=3D files('rte_crc_arm64.h') >=20 > diff --git a/lib/hash/rte_thash.c b/lib/hash/rte_thash.c > index d5a95a6..07447f7 100644 > --- a/lib/hash/rte_thash.c > +++ b/lib/hash/rte_thash.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include >=20 > #define THASH_NAME_LEN 64 > #define TOEPLITZ_HASH_LEN 32 > @@ -88,6 +89,23 @@ struct rte_thash_ctx { > uint8_t hash_key[0]; > }; >=20 > +uint8_t rte_thash_gfni_supported; .. =3D 0; ? > + > +void > +rte_thash_complete_matrix(uint64_t *matrixes, uint8_t *rss_key, int size= ) > +{ > + int i, j; > + uint8_t *m =3D (uint8_t *)matrixes; > + > + for (i =3D 0; i < size; i++) { > + for (j =3D 0; j < 8; j++) { > + m[i * 8 + j] =3D (rss_key[i] << j)| > + (uint8_t)((uint16_t)(rss_key[i + 1]) >> > + (8 - j)); > + } > + } > +} > + > static inline uint32_t > get_bit_lfsr(struct thash_lfsr *lfsr) > { > @@ -759,3 +777,11 @@ rte_thash_adjust_tuple(struct rte_thash_ctx *ctx, >=20 > return ret; > } > + > +RTE_INIT(rte_thash_gfni_init) > +{ > +#ifdef __GFNI__ > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_GFNI)) > + rte_thash_gfni_supported =3D 1; > +#endif > +} > diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h > index 76109fc..e3f1fc6 100644 > --- a/lib/hash/rte_thash.h > +++ b/lib/hash/rte_thash.h > @@ -28,6 +28,7 @@ extern "C" { > #include > #include > #include > +#include >=20 > #if defined(RTE_ARCH_X86) || defined(__ARM_NEON) > #include > @@ -113,6 +114,8 @@ union rte_thash_tuple { > }; > #endif >=20 > +extern uint8_t rte_thash_gfni_supported; > + > /** > * Prepare special converted key to use with rte_softrss_be() > * @param orig > @@ -223,6 +226,25 @@ rte_softrss_be(uint32_t *input_tuple, uint32_t input= _len, > return ret; > } >=20 > +/** > + * Converts Toeplitz hash key (RSS key) into matrixes required > + * for GFNI implementation > + * > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice. > + * > + * @param matrixes > + * pointer to the memory where matrixes will be writen. > + * Note: the size of this memory must be equal to size * 8 > + * @param rss_key > + * pointer to the Toeplitz hash key > + * @param size > + * Size of the rss_key in bytes. > + */ > +__rte_experimental > +void > +rte_thash_complete_matrix(uint64_t *matrixes, uint8_t *rss_key, int size= ); > + > /** @internal Logarithm of minimum size of the RSS ReTa */ > #define RTE_THASH_RETA_SZ_MIN 2U > /** @internal Logarithm of maximum size of the RSS ReTa */ > diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h > new file mode 100644 > index 0000000..8f89d7d > --- /dev/null > +++ b/lib/hash/rte_thash_gfni.h > @@ -0,0 +1,229 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2021 Intel Corporation > + */ > + > +#ifndef _RTE_THASH_GFNI_H_ > +#define _RTE_THASH_GFNI_H_ > + > +/** > + * @file > + * > + * Optimized Toeplitz hash functions implementation > + * using Galois Fields New Instructions. > + */ > + > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#ifdef __GFNI__ > + > +#define RTE_THASH_FIRST_ITER_MSK 0x0f0f0f0f0f0e0c08 > +#define RTE_THASH_PERM_MSK 0x0f0f0f0f0f0f0f0f > +#define RTE_THASH_FIRST_ITER_MSK_2 0xf0f0f0f0f0e0c080 > +#define RTE_THASH_PERM_MSK_2 0xf0f0f0f0f0f0f0f0 > +#define RTE_THASH_REWIND_MSK 0x0000000000113377 > + > +__rte_internal > +static inline void > +__rte_thash_xor_reduce(__m512i xor_acc, uint32_t *val_1, uint32_t *val_2= ) > +{ > + __m256i tmp_256_1, tmp_256_2; > + __m128i tmp128_1, tmp128_2; > + uint64_t tmp_1, tmp_2; > + > + tmp_256_1 =3D _mm512_castsi512_si256(xor_acc); > + tmp_256_2 =3D _mm512_extracti32x8_epi32(xor_acc, 1); > + tmp_256_1 =3D _mm256_xor_si256(tmp_256_1, tmp_256_2); > + > + tmp128_1 =3D _mm256_castsi256_si128(tmp_256_1); > + tmp128_2 =3D _mm256_extracti32x4_epi32(tmp_256_1, 1); > + tmp128_1 =3D _mm_xor_si128(tmp128_1, tmp128_2); > + > + tmp_1 =3D _mm_extract_epi64(tmp128_1, 0); > + tmp_2 =3D _mm_extract_epi64(tmp128_1, 1); > + tmp_1 ^=3D tmp_2; > + > + *val_1 =3D (uint32_t)tmp_1; > + *val_2 =3D (uint32_t)(tmp_1 >> 32); > +} > + > +__rte_internal > +static inline __m512i > +__rte_thash_gfni(uint64_t *mtrx, uint8_t *tuple, uint8_t *secondary_tupl= e, > + int len) Here and in other fast-path functions: const uint64_t *mtrx > +{ > + __m512i permute_idx =3D _mm512_set_epi8(7, 6, 5, 4, 7, 6, 5, 4, > + 6, 5, 4, 3, 6, 5, 4, 3, > + 5, 4, 3, 2, 5, 4, 3, 2, > + 4, 3, 2, 1, 4, 3, 2, 1, > + 3, 2, 1, 0, 3, 2, 1, 0, > + 2, 1, 0, -1, 2, 1, 0, -1, > + 1, 0, -1, -2, 1, 0, -1, -2, > + 0, -1, -2, -3, 0, -1, -2, -3); > + > + const __m512i rewind_idx =3D _mm512_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 59, 0, 0, 0, 59, > + 0, 0, 59, 58, 0, 0, 59, 58, > + 0, 59, 58, 57, 0, 59, 58, 57); > + const __mmask64 rewind_mask =3D RTE_THASH_REWIND_MSK; > + const __m512i shift_8 =3D _mm512_set1_epi8(8); > + __m512i xor_acc =3D _mm512_setzero_si512(); > + __m512i perm_bytes =3D _mm512_setzero_si512(); > + __m512i vals, matrixes, tuple_bytes, tuple_bytes_2; > + __mmask64 load_mask, permute_mask, permute_mask_2; > + int chunk_len =3D 0, i =3D 0; > + uint8_t mtrx_msk; > + const int prepend =3D 3; > + > + for (; len > 0; len -=3D 64, tuple +=3D 64) { What will happen if len < 64? > + if (i =3D=3D 8) > + perm_bytes =3D _mm512_maskz_permutexvar_epi8(rewind_mask, > + rewind_idx, perm_bytes); > + > + permute_mask =3D RTE_THASH_FIRST_ITER_MSK; > + load_mask =3D (len >=3D 64) ? UINT64_MAX : ((1ULL << len) - 1); > + tuple_bytes =3D _mm512_maskz_loadu_epi8(load_mask, tuple); > + if (secondary_tuple) { > + permute_mask_2 =3D RTE_THASH_FIRST_ITER_MSK_2; > + tuple_bytes_2 =3D _mm512_maskz_loadu_epi8(load_mask, > + secondary_tuple); > + } > + > + chunk_len =3D __builtin_popcountll(load_mask); > + for (i =3D 0; i < ((chunk_len + prepend) / 8); i++, mtrx +=3D 8) { > + perm_bytes =3D _mm512_mask_permutexvar_epi8(perm_bytes, > + permute_mask, permute_idx, tuple_bytes); > + > + if (secondary_tuple) > + perm_bytes =3D > + _mm512_mask_permutexvar_epi8(perm_bytes, > + permute_mask_2, permute_idx, > + tuple_bytes_2); > + > + matrixes =3D _mm512_maskz_loadu_epi64(UINT8_MAX, mtrx); > + vals =3D _mm512_gf2p8affine_epi64_epi8(perm_bytes, > + matrixes, 0); > + > + xor_acc =3D _mm512_xor_si512(xor_acc, vals); > + permute_idx =3D _mm512_add_epi8(permute_idx, shift_8); > + permute_mask =3D RTE_THASH_PERM_MSK; > + if (secondary_tuple) > + permute_mask_2 =3D RTE_THASH_PERM_MSK_2; > + } > + } > + > + int rest_len =3D (chunk_len + prepend) % 8; > + if (rest_len !=3D 0) { > + mtrx_msk =3D (1 << (rest_len % 8)) - 1; > + matrixes =3D _mm512_maskz_loadu_epi64(mtrx_msk, mtrx); > + if (i =3D=3D 8) { > + perm_bytes =3D _mm512_maskz_permutexvar_epi8(rewind_mask, > + rewind_idx, perm_bytes); > + } else { > + perm_bytes =3D _mm512_mask_permutexvar_epi8(perm_bytes, > + permute_mask, permute_idx, tuple_bytes); > + > + if (secondary_tuple) > + perm_bytes =3D > + _mm512_mask_permutexvar_epi8( > + perm_bytes, permute_mask_2, > + permute_idx, tuple_bytes_2); > + } > + > + vals =3D _mm512_gf2p8affine_epi64_epi8(perm_bytes, matrixes, 0); > + xor_acc =3D _mm512_xor_si512(xor_acc, vals); > + } > + > + return xor_acc; > +} > + > +/** > + * Calculate Toeplitz hash. > + * > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice. > + * > + * @param m > + * Pointer to the matrices generated from the corresponding > + * RSS hash key using rte_thash_complete_matrix(). > + * @param tuple > + * Pointer to the data to be hashed. Data must be in network byte order= . > + * @param len > + * Length of the data to be hashed. > + * @return > + * Calculated Toeplitz hash value. > + */ > +__rte_experimental > +static inline uint32_t > +rte_thash_gfni(uint64_t *m, uint8_t *tuple, int len) > +{ > + uint32_t val, val_zero; > + > + __m512i xor_acc =3D __rte_thash_gfni(m, tuple, NULL, len); > + __rte_thash_xor_reduce(xor_acc, &val, &val_zero); > + > + return val; > +} > + > +/** > + * Calculate Toeplitz hash for two independent data buffers. > + * > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice. > + * > + * @param m > + * Pointer to the matrices generated from the corresponding > + * RSS hash key using rte_thash_complete_matrix(). > + * @param tuple_1 > + * Pointer to the data to be hashed. Data must be in network byte order= . > + * @param tuple_2 > + * Pointer to the data to be hashed. Data must be in network byte order= . > + * @param len > + * Length of the largest data buffer to be hashed. > + * @param val_1 > + * Pointer to uint32_t where to put calculated Toeplitz hash value for > + * the first tuple. > + * @param val_2 > + * Pointer to uint32_t where to put calculated Toeplitz hash value for > + * the second tuple. > + */ > +__rte_experimental > +static inline void > +rte_thash_gfni_x2(uint64_t *mtrx, uint8_t *tuple_1, uint8_t *tuple_2, in= t len, > + uint32_t *val_1, uint32_t *val_2) Why just two? Why not uint8_t *tuple[] ? > +{ > + __m512i xor_acc =3D __rte_thash_gfni(mtrx, tuple_1, tuple_2, len); > + __rte_thash_xor_reduce(xor_acc, val_1, val_2); > +} > + > +#else /* __GFNI__ */ > + > +static inline uint32_t > +rte_thash_gfni(uint64_t *mtrx __rte_unused, uint8_t *key __rte_unused, > + int len __rte_unused) > +{ > + return 0; > +} > + > +static inline void > +rte_thash_gfni_x2(uint64_t *mtrx __rte_unused, uint8_t *tuple_1 __rte_un= used, > + uint8_t *tuple_2 __rte_unused, int len __rte_unused, > + uint32_t *val_1 __rte_unused, uint32_t *val_2 __rte_unused) > +{ > + That seems inconsistent with dummy rte_thash_gfni() above. Should be: *val_1 =3D 0; *val_2 =3D 0;=20 I think. > +} > + > +#endif > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_THASH_GFNI_H_ */ > diff --git a/lib/hash/version.map b/lib/hash/version.map > index ce4309a..cecf922 100644 > --- a/lib/hash/version.map > +++ b/lib/hash/version.map > @@ -39,10 +39,12 @@ EXPERIMENTAL { > rte_hash_rcu_qsbr_add; > rte_thash_add_helper; > rte_thash_adjust_tuple; > + rte_thash_complete_matrix; > rte_thash_find_existing; > rte_thash_free_ctx; > rte_thash_get_complement; > rte_thash_get_helper; > rte_thash_get_key; > + rte_thash_gfni_supported; > rte_thash_init_ctx; > }; > -- > 2.7.4