From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8D403A04FD; Mon, 1 Aug 2022 00:11:05 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2FD544067B; Mon, 1 Aug 2022 00:11:05 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id F346B4021E for ; Mon, 1 Aug 2022 00:11:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659305463; x=1690841463; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=2AMHphZNMWHvJxliRe6iq2rXiQlhNY6T6aikRf81ENc=; b=S31aSs6s7ml/Ft9A22mOuMNdeOAU8FUvIg4o2TT543fp2K4TvmDl6L9p WPJv6WbTKERRLrIHvISwAg8G64hzRYw4cRSaJiq6jsud+b492ciNRaM94 svYXXPEFNZZqKi8/bvEP7+1YlOUhHQD30+OR7Fi63DF3c8s97uYTyMZMt a7EmOJQ54ve3TmOXZd0Ybx4J0kREKVz1KjPpyx1+GqvccMgIWCGRnjo20 gM+OvV0Gyua5pTQnjY/PNkZPWLtU6F0mMYrLbUTSeL8mqXjxS9HqXWMJi 2mG09v4tzFk6cS2v9Tpt887o1NGJKMacMMWyhvqBRqHY8hfOv+oF9/peO A==; X-IronPort-AV: E=McAfee;i="6400,9594,10425"; a="375334000" X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="375334000" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 15:11:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="630013595" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by orsmga008.jf.intel.com with ESMTP; 31 Jul 2022 15:11:01 -0700 Received: from orsmsx608.amr.corp.intel.com (10.22.229.21) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Sun, 31 Jul 2022 15:11:00 -0700 Received: from orsmsx606.amr.corp.intel.com (10.22.229.19) by ORSMSX608.amr.corp.intel.com (10.22.229.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Sun, 31 Jul 2022 15:10:59 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx606.amr.corp.intel.com (10.22.229.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28 via Frontend Transport; Sun, 31 Jul 2022 15:10:59 -0700 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (104.47.66.43) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.28; Sun, 31 Jul 2022 15:10:59 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OUU6ExtZggy8gvpdkcKGeVnHPD51etzKrc6p2TREG+2EERxVFESEATaxlwkbzfeXIDtjDTDe64bXOHvOxcq0d4fYN1bQ3k3S6DkFOrOIhJpFjOR+FoOdlIEFJ9+AMp1NKtG6XJZVJLhU6sDXZ/9ytbxrgpaoZYh7YD/56exfF2kDunlghkdmlgBJrBGragRXz+8KlMElt4kq+vgHfDLyilWkU+kt0DC+PDOBl64Sa92yHSPMXtvgUWopDfetgohOtLxIEn0eENhqUpa/iRQZ9jvwHh5r3BFVl03aoo4Hpy9J1zOmY/4/yJ17opjtZuzJzb1eBvCxtHCiW08zxgujBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sZVBIY27vwVG74z0iiGSebo2ITgOiWjscoW9MufDWPc=; b=SzB4qDSy56JPc2F90nHGE+QPX5sJsbth9ohnCUciTgWNMRfjuW5H8nAPsnssv8DHySWZD78Pbf6JzV7L3ZJJ5Af4c2JSNHvKE4qfb50+XbzmjpHfSaAIf2BQMrrYfDcdtsZzLflLRDSLa1eWudnKuhqsNoGGEsmLJrnOshknsVM0odibfdfZBGCI4Iwsntxu+ZNc6x5G/0hDG2YqqzVq1EVM0mbK76Ny6Ru58gXbla4MIN5en7wgyniJ0gw3ZEMdyrNiewXXu4ILtIbd9XxHzuQyPDWtJTfZeuA1RaV2v7bqToMUA+LbkdJFYaexE799mcslMguni9cyvdQ2Tfv8Mg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from CO1PR11MB5172.namprd11.prod.outlook.com (2603:10b6:303:6c::10) by BN8PR11MB3812.namprd11.prod.outlook.com (2603:10b6:408:90::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5482.14; Sun, 31 Jul 2022 22:10:56 +0000 Received: from CO1PR11MB5172.namprd11.prod.outlook.com ([fe80::157:f41c:f124:af3a]) by CO1PR11MB5172.namprd11.prod.outlook.com ([fe80::157:f41c:f124:af3a%4]) with mapi id 15.20.5482.016; Sun, 31 Jul 2022 22:10:55 +0000 From: "Wang, Yipeng1" To: "Rong, Leyi" , "Wang, Yipeng1" , "zaoxingliu@gmail.com" , "Gobriel, Sameh" CC: "dev@dpdk.org" Subject: RE: [RFC,1/2] member: implement NitroSketch mode Thread-Topic: [RFC,1/2] member: implement NitroSketch mode Thread-Index: AQHYdZDEa7f9W0mRGEqYWKidHAJLRa2ZVIQA Date: Sun, 31 Jul 2022 22:10:55 +0000 Message-ID: References: <20220601082228.10158-1-leyi.rong@intel.com> <20220601082228.10158-2-leyi.rong@intel.com> In-Reply-To: <20220601082228.10158-2-leyi.rong@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-reaction: no-action dlp-version: 11.6.500.17 dlp-product: dlpe-windows authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9d8a0366-1953-4d2d-659d-08da73418a6f x-ms-traffictypediagnostic: BN8PR11MB3812:EE_ x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: OwTQ3SOz5xHq0fevpOvEegvIK5ycUTTY8nAD1NFb6pp4/ufCu3WVkpW8f4pr+a3iFbrA3L++cVWxcXCPzzL+gFbAfyHGw67M54mFuihfVQyAqPJDdc/kp0ogojSzEVHeDOYJrXwddcL1G1cMS6QsvC6DVeRq/zcBcfqANggrYE9jBF0ultINDxFiUyujRfZUEi9S/hLCqBonhsTwIMKjsHQxQfNWzET6/M1q2awyaTLCDcJXD+WbTh0Wc2arkG3/k3PCFdutFaVOK81Ca9xPzdbCLh2hOxgDEg6P3ncKfVBr+QZ+sfEhZJ6aM2su9IJmhwl20rkw5Z9lR+fegTCmC6EZyEwi0Lg0f6ACh8XloK8BXiZbP98+pwV49MdFonXWr7iP/r0VHaR74WOHCdCYoLpKir0Gh1/GAqUQd/ejc6jgyO4Iq3Z7Sg2PDC5P0dt6G25LG0Sbi53VDNQku/P440hDRfcAHqeWEGrP03BuucQSf7Ulp6Xp3S55LUg1Tej0mXupEGwC9w5RywbCyTnk1ly9GUkg5Awai1YzuMx2+psyllNZcvRREdE2Hv/QhbytaZytS1KUICKo4DpS5PHnEmbzEms0qzU3523dvUqndUSDiHGnQCEYM+RNSE/lKZd7rlkHnBUtzXZP6LpXU2HTGfbQHgWo6+oxH3/fOk7dZnYJXQB3/ML1rNpIYdhhuqZg0/ZRz9i7NQCT5G5JwKlaTU8q8H8gZpCit9laHyeo4CdpGhpgmR5kej2/4FTm7xLWn2+F8dGedBoEJcMGUDFPUv5Dn0m8hFasKUazE01cN52gOc8VgrUaKOMzRnE2sLAO x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CO1PR11MB5172.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(376002)(396003)(346002)(136003)(39860400002)(366004)(316002)(110136005)(6636002)(53546011)(41300700001)(71200400001)(26005)(6506007)(9686003)(7696005)(478600001)(2906002)(86362001)(33656002)(55016003)(8676002)(66446008)(66476007)(66556008)(64756008)(66946007)(76116006)(4326008)(30864003)(8936002)(5660300002)(52536014)(38070700005)(186003)(38100700002)(83380400001)(82960400001)(122000001)(559001)(579004); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?ra3nyEb3HOqJ4LdRFV8gLW7NTRzZwk+atBpDsR/a7cuhkaO9pb4D+jX9r8jb?= =?us-ascii?Q?MdgLKFa42OuUE5+2xh9M0MFdMtYNwH/RlSI32UYy50rwnyXTGEJ8MUm4sNZe?= =?us-ascii?Q?ArYBEENIrLWj5pMsLLJiw1OALNt0ScCPKJo+eRg50wg0dcZGx9SJFmrgQVGj?= =?us-ascii?Q?a72Y9aKJipM82JjmZurGyHwRBMxNESdiJAApa8iyc3EB/uhAnIVFA6H8I9jy?= =?us-ascii?Q?92YsvbhgQc87ZU5SFKXNBPmT3pC+a8SKvgxG+THzNOSELKB4UD2I/02Cj8/D?= =?us-ascii?Q?6CdbXfFpiXVggXTDpPCazlrBq1Cqipy3J1uKYAefXnRyRsvcbnt/Nq7KvZIU?= =?us-ascii?Q?4DHqTaxnsm26Nhw8iU9XXnMYFzTJFGd/D579MkWGDApR0gJD2ySjDCaGmxl5?= =?us-ascii?Q?TNT1A4JYm6s5ZECT7QDphSn1K9KFfSD/i4a1W35m52RvWHPISSsEAh27Va0Y?= =?us-ascii?Q?RVaLJsCTv9nbriinc1tCE1oJgMlFe1tNt5ASfaikiPWp1mTFbHzvmZCiiHk+?= =?us-ascii?Q?knsI3unmRCRztFLMvacGGn8mXb699oj2zrAJPGUzaJtcgCqt8jCXc2E/Ba+i?= =?us-ascii?Q?q5I/oLjWEhmv/TtSGS+H18QuMRGYe+OEnNIu/EF2e/RHwPj0pRoYCnj0rYFm?= =?us-ascii?Q?DsF/YinFbh/9CqdF+txTeDP3kqlxQigjEVagE4ocUtS9lvQGZf0klbZ2JJ1b?= =?us-ascii?Q?AM0xPwG6Up3I6/HwZdxRnHm3x5Y45yEU5YSubsipqPbHJyLkBMdhSX4r6Gh5?= =?us-ascii?Q?A/WjQo/dIHZEoKm6QJydax/U5tykvXfV0jpOo1IJTHw2KvgMf8QHmfeRQcFg?= =?us-ascii?Q?IyuSQlnLXHswtUdtnFXBGO/++42V3mnZHVRnbWEopTnMGEqO6YJFn4BffeLf?= =?us-ascii?Q?r4aRo6V6HOjCwCRnJRfD4Q8TTvzl6ru5/DEXTWsWd1vRnZGHUOMSqaaYSG/s?= =?us-ascii?Q?3nfqUYjFHOHz3360FZbERk+ib3HW6lnyYx8yCZ+EgcSbq0yNgK9Vfl7P+/HV?= =?us-ascii?Q?gZqXib0pdqQ3nBoE9LOu5AgoGVbZ9+kMakTsQAWUGtPG8BO62MLwRqZXtQKP?= =?us-ascii?Q?yqn2fLQBx/ZLDZEyzPYuwCu1+Muy94e+dvfybftz8jvNNfdxg1lftI47mWxs?= =?us-ascii?Q?FBisruP/1HxC/uMZNMIMm5eWTKHP90xWLBVJe/OcoXovQeqBzC2YTNjLtHff?= =?us-ascii?Q?rISl8orcrAuY3VddT6sCPVd9NC1b+zatHWB61BsT/t9lZVxPpn37tu2c9GHC?= =?us-ascii?Q?vdtniz377qfEzowD2sHVX4+JuqmXvvgVOYu75FtsA3gM0VYkpLSYdLwqOPNx?= =?us-ascii?Q?mwwVwSjeu6J+TaB+Vuvn4FECAYo1fwhwzkvClTN4wbX+IdlmxvMAvLairrhn?= =?us-ascii?Q?Hk6E/CCfSwj+EVlxFesXta8dJ5z/TapKWhs/vOkQwt5U3KbgE22XNYSqbgmG?= =?us-ascii?Q?o09MxIOdGf2oSd/KVidxAhvr1RmR/MT7P+gyjVo0r/ThJIQzvH2VXhZ4I5AE?= =?us-ascii?Q?kOdEagxfhAcrtFwfEOzNlSyvhTxGOv9kNbC0ezaHloA66+3XrxcrfS19YOQl?= =?us-ascii?Q?pJxileiuxq+KWN5143oDHECOv2u5NXEkd6cH3i8p?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CO1PR11MB5172.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9d8a0366-1953-4d2d-659d-08da73418a6f X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Jul 2022 22:10:55.8815 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: k4A68JPtfwwN8rnWxUt9yRcoyA6pYb3LklsL37k4DUuQWgA+oh0h/2lDQC2FMGR9ZwKzixQcfaxTEyllG52Xww== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR11MB3812 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Rong, Leyi > Sent: Wednesday, June 1, 2022 1:22 AM > To: Wang, Yipeng1 ; zaoxingliu@gmail.com; Gobriel= , > Sameh > Cc: dev@dpdk.org; Rong, Leyi > Subject: [RFC,1/2] member: implement NitroSketch mode >=20 > Sketching algorithm provide high-fidelity approximate measurements and > appears as a promising alternative to traditional approches such as > packet sampling. >=20 > NitroSketch [1] is a software sketching framework that optimizes > performance, provides accuracy guarantees, and supports a variety of > sketches. >=20 > This commit adds a new data structure called sketch into > membership library. This new data structure is an efficient > way to profile the traffic for heavy hitters. Also use min-heap > structure to maintain the top-k flow keys. >=20 > [1] Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir > Braverman, Roy Friedman, Vyas Sekar, "NitroSketch: Robust and General > Sketch-based Monitoring in Software Switches", in ACM SIGCOMM 2019. >=20 > Signed-off-by: Alan Liu > Signed-off-by: Yipeng Wang > Signed-off-by: Leyi Rong > --- > lib/member/meson.build | 35 +- > lib/member/rte_member.c | 75 ++++ > lib/member/rte_member.h | 141 ++++++- > lib/member/rte_member_heap.h | 449 ++++++++++++++++++++ > lib/member/rte_member_sketch.c | 584 ++++++++++++++++++++++++++ > lib/member/rte_member_sketch.h | 96 +++++ > lib/member/rte_member_sketch_avx512.c | 69 +++ > lib/member/rte_member_sketch_avx512.h | 36 ++ > lib/member/rte_xxh64_avx512.h | 117 ++++++ > 9 files changed, 1598 insertions(+), 4 deletions(-) > create mode 100644 lib/member/rte_member_heap.h > create mode 100644 lib/member/rte_member_sketch.c > create mode 100644 lib/member/rte_member_sketch.h > create mode 100644 lib/member/rte_member_sketch_avx512.c > create mode 100644 lib/member/rte_member_sketch_avx512.h > create mode 100644 lib/member/rte_xxh64_avx512.h >=20 > diff --git a/lib/member/meson.build b/lib/member/meson.build > index e06fddc240..426c9891c2 100644 > --- a/lib/member/meson.build > +++ b/lib/member/meson.build > @@ -7,6 +7,39 @@ if is_windows > subdir_done() > endif >=20 > -sources =3D files('rte_member.c', 'rte_member_ht.c', 'rte_member_vbf.c') > +sources =3D files('rte_member.c', 'rte_member_ht.c', 'rte_member_vbf.c', > 'rte_member_sketch.c') > headers =3D files('rte_member.h') > deps +=3D ['hash'] > + > +# compile AVX512 version if: > +# we are building 64-bit binary AND binutils can generate proper code > +if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok > + # compile AVX512 version if either: > + # a. we have AVX512 supported in minimum instruction set > + # baseline > + # b. it's not minimum instruction set, but supported by > + # compiler > + # > + # in former case, just add avx512 C file to files list > + # in latter case, compile c file to static lib, using correct > + # compiler flags, and then have the .o file from static lib > + # linked into main lib. > + sketch_avx512_cpu_support =3D ( > + cc.get_define('__AVX512F__', args: machine_args) !=3D '' > + ) > + > + if sketch_avx512_cpu_support =3D=3D true > + cflags +=3D ['-DCC_AVX512_SUPPORT'] > + if cc.has_argument('-mavx512f') > + cflags +=3D ['-mavx', '-mavx2', '-mavx512f', '-mavx512ifma', '- > march=3Dicelake-server'] > + endif > + sources +=3D files('rte_member_sketch_avx512.c') > + elif cc.has_argument('-mavx512f') > + cflags +=3D '-DCC_AVX512_SUPPORT' > + sketch_avx512_tmp =3D static_library('sketch_avx512_tmp', > + 'rte_member_sketch_avx512.c', > + dependencies: static_rte_eal, > + c_args: cflags + ['-mavx512f']) > + objs +=3D > sketch_avx512_tmp.extract_objects('rte_member_sketch_avx512.c') > + endif > +endif > diff --git a/lib/member/rte_member.c b/lib/member/rte_member.c > index 7e1632e6b5..8f859f7fbd 100644 > --- a/lib/member/rte_member.c > +++ b/lib/member/rte_member.c > @@ -9,10 +9,12 @@ > #include > #include > #include > +#include >=20 > #include "rte_member.h" > #include "rte_member_ht.h" > #include "rte_member_vbf.h" > +#include "rte_member_sketch.h" >=20 > TAILQ_HEAD(rte_member_list, rte_tailq_entry); > static struct rte_tailq_elem rte_member_tailq =3D { > @@ -72,6 +74,9 @@ rte_member_free(struct rte_member_setsum *setsum) > case RTE_MEMBER_TYPE_VBF: > rte_member_free_vbf(setsum); > break; > + case RTE_MEMBER_TYPE_SKETCH: > + rte_member_free_sketch(setsum); > + break; > default: > break; > } > @@ -86,6 +91,8 @@ rte_member_create(const struct rte_member_parameters > *params) > struct rte_member_list *member_list; > struct rte_member_setsum *setsum; > int ret; > + char ring_name[RTE_RING_NAMESIZE]; > + struct rte_ring *sketch_key_ring =3D NULL; >=20 > if (params =3D=3D NULL) { > rte_errno =3D EINVAL; > @@ -100,6 +107,16 @@ rte_member_create(const struct > rte_member_parameters *params) > return NULL; > } >=20 > + if (params->type =3D=3D RTE_MEMBER_TYPE_SKETCH) { > + snprintf(ring_name, sizeof(ring_name), "SK_%s", params- > >name); > + sketch_key_ring =3D rte_ring_create_elem(ring_name, > sizeof(uint32_t), > + rte_align32pow2(params->top_k), params- > >socket_id, 0); > + if (sketch_key_ring =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Ring Memory > allocation failed\n"); > + return NULL; > + } > + } > + > member_list =3D RTE_TAILQ_CAST(rte_member_tailq.head, > rte_member_list); >=20 > rte_mcfg_tailq_write_lock(); > @@ -145,6 +162,9 @@ rte_member_create(const struct > rte_member_parameters *params) > case RTE_MEMBER_TYPE_VBF: > ret =3D rte_member_create_vbf(setsum, params); > break; > + case RTE_MEMBER_TYPE_SKETCH: > + ret =3D rte_member_create_sketch(setsum, params, > sketch_key_ring); > + break; > default: > goto error_unlock_exit; > } > @@ -162,6 +182,7 @@ rte_member_create(const struct > rte_member_parameters *params) > error_unlock_exit: > rte_free(te); > rte_free(setsum); > + rte_ring_free(sketch_key_ring); > rte_mcfg_tailq_write_unlock(); > return NULL; > } > @@ -178,6 +199,23 @@ rte_member_add(const struct rte_member_setsum > *setsum, const void *key, > return rte_member_add_ht(setsum, key, set_id); > case RTE_MEMBER_TYPE_VBF: > return rte_member_add_vbf(setsum, key, set_id); > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_add_sketch(setsum, key, set_id); > + default: > + return -EINVAL; > + } > +} > + > +int > +rte_member_add_byte_count(const struct rte_member_setsum *setsum, > + const void *key, uint32_t byte_count) > +{ > + if (setsum =3D=3D NULL || key =3D=3D NULL || byte_count =3D=3D 0) > + return -EINVAL; > + > + switch (setsum->type) { > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_add_sketch_byte_count(setsum, key, > byte_count); > default: > return -EINVAL; > } > @@ -195,6 +233,8 @@ rte_member_lookup(const struct rte_member_setsum > *setsum, const void *key, > return rte_member_lookup_ht(setsum, key, set_id); > case RTE_MEMBER_TYPE_VBF: > return rte_member_lookup_vbf(setsum, key, set_id); > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_lookup_sketch(setsum, key, set_id); > default: > return -EINVAL; > } > @@ -261,6 +301,36 @@ rte_member_lookup_multi_bulk(const struct > rte_member_setsum *setsum, > } > } >=20 > +int > +rte_member_query_count(const struct rte_member_setsum *setsum, > + const void *key, uint64_t *output) > +{ > + if (setsum =3D=3D NULL || key =3D=3D NULL || output =3D=3D NULL) > + return -EINVAL; > + > + switch (setsum->type) { > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_query_sketch(setsum, key, output); > + default: > + return -EINVAL; > + } > +} > + > +int > +rte_member_report_heavyhitter(const struct rte_member_setsum *setsum, > + void **key, uint64_t *count) > +{ > + if (setsum =3D=3D NULL || key =3D=3D NULL || count =3D=3D NULL) > + return -EINVAL; > + > + switch (setsum->type) { > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_report_heavyhitter_sketch(setsum, key, > count); > + default: > + return -EINVAL; > + } > +} > + > int > rte_member_delete(const struct rte_member_setsum *setsum, const void *ke= y, > member_set_t set_id) > @@ -272,6 +342,8 @@ rte_member_delete(const struct rte_member_setsum > *setsum, const void *key, > case RTE_MEMBER_TYPE_HT: > return rte_member_delete_ht(setsum, key, set_id); > /* current vBF implementation does not support delete function */ > + case RTE_MEMBER_TYPE_SKETCH: > + return rte_member_delete_sketch(setsum, key); > case RTE_MEMBER_TYPE_VBF: > default: > return -EINVAL; > @@ -290,6 +362,9 @@ rte_member_reset(const struct rte_member_setsum > *setsum) > case RTE_MEMBER_TYPE_VBF: > rte_member_reset_vbf(setsum); > return; > + case RTE_MEMBER_TYPE_SKETCH: > + rte_member_reset_sketch(setsum); > + return; > default: > return; > } > diff --git a/lib/member/rte_member.h b/lib/member/rte_member.h > index 567ee0c84b..6afeda542f 100644 > --- a/lib/member/rte_member.h > +++ b/lib/member/rte_member.h > @@ -39,6 +39,18 @@ > * | | | not overwrite | = | > * | | | existing key. | = | > * +----------+---------------------+----------------+------------------= -------+ > + * +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > + * | type | sketch | > + * +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ > + * |structure | counting bloom filter array | > + * +----------+-----------------------------+ > + * |set id | 1: heavy set, 0: light set | > + * | | | > + * +----------+-----------------------------+ > + * |usages & | count size of a flow, | > + * |properties| used for heavy hitter | > + * | | detection. | > + * +----------+-----------------------------+ > * --> > */ >=20 > @@ -50,6 +62,7 @@ extern "C" { > #endif >=20 > #include > +#include >=20 > #include >=20 > @@ -66,6 +79,11 @@ typedef uint16_t member_set_t; > /** Maximum number of characters in setsum name. */ > #define RTE_MEMBER_NAMESIZE 32 >=20 > +/** For sketch, use the flag if prefer always bounded mode. */ [Wang, Yipeng] Maybe a bit more information on what is always-bounded, I do= n't see any explanation of this parameter. > +#define RTE_MEMBER_SKETCH_ALWAYS_BOUNDED 0x01 > +/** For sketch, use the flag if to count packet size instead of packet c= ount */ > +#define RTE_MEMBER_SKETCH_COUNT_BYTE 0x02 > + > /** @internal Hash function used by membership library. */ > #if defined(RTE_ARCH_X86) || defined(__ARM_FEATURE_CRC32) > #include > @@ -104,6 +122,7 @@ struct rte_member_parameters; > enum rte_member_setsum_type { > RTE_MEMBER_TYPE_HT =3D 0, /**< Hash table based set summary. */ > RTE_MEMBER_TYPE_VBF, /**< Vector of bloom filters. */ > + RTE_MEMBER_TYPE_SKETCH, > RTE_MEMBER_NUM_TYPE > }; >=20 > @@ -114,6 +133,19 @@ enum rte_member_sig_compare_function { > RTE_MEMBER_COMPARE_NUM > }; >=20 > +/* sketch update function with different implementations. */ > +typedef void (*sketch_update_fn_t)(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t count); > + > +/* sketch lookup function with different implementations. */ > +typedef uint64_t (*sketch_lookup_fn_t)(const struct rte_member_setsum *s= s, > + const void *key); > + > +/* sketch delete function with different implementations. */ > +typedef void (*sketch_delete_fn_t)(const struct rte_member_setsum *ss, > + const void *key); > + > /** @internal setsummary structure. */ > struct rte_member_setsum { > enum rte_member_setsum_type type; /* Type of the set summary. */ > @@ -134,6 +166,21 @@ struct rte_member_setsum { > uint32_t bit_mask; /* Bit mask to get bit location in bf. */ > uint32_t num_hashes; /* Number of hash values to index bf. */ >=20 > + /* Parameters for sketch */ > + float error_rate; > + float sample_rate; > + uint32_t num_col; > + uint32_t num_row; > + int always_bounded; > + double converge_thresh; > + uint32_t topk; > + uint32_t count_byte; > + uint64_t *hash_seeds; > + sketch_update_fn_t sketch_update; /* Pointer to the sketch update > function */ > + sketch_lookup_fn_t sketch_lookup; /* Pointer to the sketch lookup > function */ > + sketch_delete_fn_t sketch_delete; /* Pointer to the sketch delete > function */ > + > + void *runtime_var; > uint32_t mul_shift; /* vbf internal variable used during bit test. */ > uint32_t div_shift; /* vbf internal variable used during bit test. */ >=20 > @@ -143,6 +190,9 @@ struct rte_member_setsum { > /* Second cache line should start here. */ > uint32_t socket_id; /* NUMA Socket ID for memory. */ > char name[RTE_MEMBER_NAMESIZE]; /* Name of this set summary. */ > +#ifdef RTE_ARCH_X86 > + bool use_avx512; > +#endif > } __rte_cache_aligned; >=20 > /** > @@ -261,8 +311,33 @@ struct rte_member_parameters { > */ > uint32_t sec_hash_seed; >=20 > + /** > + * For count(min) sketch data structure, error rate defines the accurac= y > + * required by the user. Higher accuracy leads to more memory usage, > but > + * the flow size is estimated more accurately. > + */ > + float error_rate; > + > + /** > + * Sampling rate means the internal sample rate of the rows of the > count > + * min sketches. Lower sampling rate can reduce CPU overhead, but the > + * data structure will require more time to converge statistically. > + */ > + float sample_rate; > + > + /** > + * How many top heavy hitter to be reported. The library will internall= y > + * keep the keys of heavy hitters for final report. > + */ > + uint32_t top_k; > + > + /** > + * Extra flags that may passed in by user > + */ > + uint32_t extra_flag; > + > int socket_id; /**< NUMA Socket ID for memory. */ > -}; > +} __rte_cache_aligned; >=20 > /** > * @warning > @@ -418,7 +493,7 @@ rte_member_lookup_multi_bulk(const struct > rte_member_setsum *setsum, > * RTE_MEMBER_NO_MATCH by default is set as 0. > * For HT mode, the set_id has range as [1, 0x7FFF], MSB is reserved. > * For vBF mode the set id is limited by the num_set parameter when cr= eate > - * the set-summary. > + * the set-summary. For sketch mode, this id is ignored. > * @return > * HT (cache mode) and vBF should never fail unless the set_id is not = in the > * valid range. In such case -EINVAL is returned. > @@ -429,12 +504,72 @@ rte_member_lookup_multi_bulk(const struct > rte_member_setsum *setsum, > * Return 0 for HT (cache mode) if the add does not cause > * eviction, return 1 otherwise. Return 0 for non-cache mode if succes= s, > * -ENOSPC for full, and 1 if cuckoo eviction happens. > - * Always returns 0 for vBF mode. > + * Always returns 0 for vBF mode and sketch. > */ > int > rte_member_add(const struct rte_member_setsum *setsum, const void *key, > member_set_t set_id); >=20 > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Add the packet byte size into the sketch > + * > + * @param setsum > + * Pointer of a set-summary. > + * @param key > + * Pointer of the key to be added. > + * @param byte_count > + * Add the byte count of the packet into the sketch. > + * @return > + * Return -EINVAL for invalid parameters, otherwise return 0. > + */ > +int > +rte_member_add_byte_count(const struct rte_member_setsum *setsum, > + const void *key, uint32_t byte_count); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * query packet count [Wang, Yipeng] Query packet count for a certain flow-key. > + * > + * @param setsum > + * Pointer of a set-summary. > + * @param key > + * Pointer of the key to be added. > + * @param count > + * The output packet count or byte count. > + * @return > + * Return -EINVAL for invalid parameters. > + */ > +int > +rte_member_query_count(const struct rte_member_setsum *setsum, > + const void *key, uint64_t *count); > + > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Insert key into set-summary (SS). [Wang, Yipeng] This API is for reporting. The comment for this API is wron= g. > + * > + * @param setsum > + * Pointer of a set-summary. > + * @param key > + * Pointer of the output top-k key array. > + * @param count > + * Pointer of the output packet count or byte count array of the top-k= keys. > + * @return > + * Return -EINVAL for invalid parameters. Return a positive integer in= dicate > + * how many heavy hitters are reported. > + */ > +int > +rte_member_report_heavyhitter(const struct rte_member_setsum *setsum, > + void **keys, uint64_t *counts); > + > + > /** > * @warning > * @b EXPERIMENTAL: this API may change without prior notice > diff --git a/lib/member/rte_member_heap.h b/lib/member/rte_member_heap.h [Wang, Yipeng] This is a software heap enhanced by a hash table. I think there should be more comment describing the feature of this file, a= nd the algorithm. > new file mode 100644 > index 0000000000..3d265c793f > --- /dev/null > +++ b/lib/member/rte_member_heap.h > @@ -0,0 +1,449 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + * Copyright(c) 2020, Alan Liu > + */ > + > +#ifndef _RTE_MEMBER_HEAP_H_ > +#define _RTE_MEMBER_HEAP_H_ > + > +#include > +#include "rte_member.h" > + > +#define LCHILD(x) (2 * x + 1) > +#define RCHILD(x) (2 * x + 2) > +#define PARENT(x) ((x - 1) / 2) > + > +#define HASH_BKT_SIZE 16 > +#define HASH_HP_MULTI 4 > +#define HASH_RESIZE_MULTI 2 > + > +#define MINHEAP_OPTIMIZED_HT > + > +struct hash_bkt { > + uint16_t sig[HASH_BKT_SIZE]; > + uint16_t idx[HASH_BKT_SIZE]; > +}; > + > +struct hash { > + uint16_t bkt_cnt; > + uint16_t num_item; > + uint32_t seed; > + struct hash_bkt buckets[0]; > +}; > + > +struct node { > + void *key; > + uint64_t count; > +}; > + > +struct minheap { > + uint32_t key_len; > + uint32_t size; > + uint32_t socket; > + struct hash *hashtable; > + struct node *elem; > +}; > + > +#ifdef MINHEAP_OPTIMIZED_HT [Wang, Yipeng] Is this macro defined somewhere? Do we need both optimized a= nd non-optimized code? > +static int > +hash_table_insert(const void *key, int value, int key_len, struct hash *= table) > +{ > + uint32_t hash =3D MEMBER_HASH_FUNC(key, key_len, table->seed); > + uint16_t idx =3D hash % table->bkt_cnt; > + uint16_t sig =3D hash >> 16; > + > + for (int i =3D 0; i < HASH_BKT_SIZE; i++) { > + if (table->buckets[idx].idx[i] =3D=3D 0) { > + table->buckets[idx].idx[i] =3D value; > + table->buckets[idx].sig[i] =3D sig; > + table->num_item++; > + return 0; > + } > + } > + > + return -ENOMEM; > +} > + > +static int > +hash_table_update(const void *key, int old_value, int value, int key_len= , struct > hash *table) > +{ > + uint32_t hash =3D MEMBER_HASH_FUNC(key, key_len, table->seed); > + uint16_t idx =3D hash % table->bkt_cnt; > + uint16_t sig =3D hash >> 16; > + > + for (int i =3D 0; i < HASH_BKT_SIZE; i++) { > + if (table->buckets[idx].sig[i] =3D=3D sig && table->buckets[idx].idx[i= ] > =3D=3D old_value) { > + table->buckets[idx].idx[i] =3D value; > + return 0; > + } > + } > + > + return -1; > +} > + > +static int > +hash_table_del(const void *key, uint16_t value, int key_len, struct hash= *table) > +{ > + uint32_t hash =3D MEMBER_HASH_FUNC(key, key_len, table->seed); > + uint16_t idx =3D hash % table->bkt_cnt; > + uint16_t sig =3D hash >> 16; > + > + for (int i =3D 0; i < HASH_BKT_SIZE; i++) { > + if (table->buckets[idx].sig[i] =3D=3D sig && table->buckets[idx].idx[i= ] > =3D=3D value) { > + table->buckets[idx].idx[i] =3D 0; > + table->num_item--; > + return 0; > + } > + } > + > + return -1; > +} > + > +static int > +hash_table_lookup(const void *key, int key_len, struct minheap *hp) > +{ > + struct hash *table =3D hp->hashtable; > + uint32_t hash =3D MEMBER_HASH_FUNC(key, key_len, table->seed); > + uint16_t idx =3D hash % table->bkt_cnt; > + uint16_t sig =3D hash >> 16; > + > + for (int i =3D 0; i < HASH_BKT_SIZE; i++) { > + if (table->buckets[idx].sig[i] =3D=3D sig && table- > >buckets[idx].idx[i] !=3D 0) { > + uint32_t hp_idx =3D table->buckets[idx].idx[i] - 1; > + > + if (memcmp(hp->elem[hp_idx].key, key, hp->key_len) > =3D=3D 0) > + return hp_idx; > + } > + } > + > + return -ENOENT; /* key doesn't exist */ > +} > + > +static int > +resize_hash_table(struct minheap *hp) > +{ > + uint32_t i; > + uint32_t new_bkt_cnt; > + > + while (1) { > + new_bkt_cnt =3D hp->hashtable->bkt_cnt * HASH_RESIZE_MULTI; > + > + RTE_MEMBER_LOG(ERR, "Sketch Minheap HT load factor is > [%f]\n", > + hp->hashtable->num_item / ((float)hp->hashtable- > >bkt_cnt * HASH_BKT_SIZE)); > + RTE_MEMBER_LOG(ERR, "Sketch Minheap HT resize > happen!\n"); > + rte_free(hp->hashtable); > + hp->hashtable =3D rte_zmalloc_socket(NULL, sizeof(struct hash) + > + new_bkt_cnt * sizeof(struct > hash_bkt), > + RTE_CACHE_LINE_SIZE, hp- > >socket); > + > + if (hp->hashtable =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Minheap HT > allocation failed\n"); > + return -ENOMEM; > + } > + > + hp->hashtable->bkt_cnt =3D new_bkt_cnt; > + > + for (i =3D 0; i < hp->size; ++i) { > + if (hash_table_insert(hp->elem[i].key, > + i + 1, hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, > + "Sketch Minheap HT resize insert > fail!\n"); > + break; > + } > + } > + if (i =3D=3D hp->size) > + break; > + } > + > + return 0; > +} > +#endif /* MINHEAP_OPTIMIZED_HT */ > + > +/* find the item in the given minheap */ > +static int > +rte_member_minheap_find(struct minheap *hp, const void *key) > +{ > +#ifdef MINHEAP_OPTIMIZED_HT > + int idx =3D hash_table_lookup(key, hp->key_len, hp); > + return idx; > +#else > + uint32_t idx; > + > + for (idx =3D 0; idx < hp->size; ++idx) { > + if (memcmp(hp->elem[idx].key, key, hp->key_len) =3D=3D 0) > + return idx; > + } > + > + return -ENOENT; /* key doesn't exist */ > +#endif > +} > + > +static int > +rte_member_minheap_init(struct minheap *heap, int size, > + uint32_t socket, uint32_t seed) > +{ > + heap->elem =3D rte_zmalloc_socket(NULL, sizeof(struct node) * size, > + RTE_CACHE_LINE_SIZE, socket); > + if (heap->elem =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Minheap elem allocation > failed\n"); > + return -ENOMEM; > + } > + > + uint32_t hash_bkt_cnt =3D rte_align32pow2(size * HASH_HP_MULTI) / > HASH_BKT_SIZE; > + > + if (hash_bkt_cnt =3D=3D 0) > + hash_bkt_cnt =3D 1; > + > + heap->hashtable =3D rte_zmalloc_socket(NULL, sizeof(struct hash) + > + hash_bkt_cnt * sizeof(struct hash_bkt), > + RTE_CACHE_LINE_SIZE, socket); > + > + if (heap->hashtable =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Minheap HT allocation > failed\n"); > + rte_free(heap->elem); > + return -ENOMEM; > + } > + > + heap->hashtable->seed =3D seed; > + heap->hashtable->bkt_cnt =3D hash_bkt_cnt; > + heap->socket =3D socket; > + > + return 0; > +} > + > +/* swap the minheap nodes */ > +static __rte_always_inline void > +rte_member_heap_swap(struct node *n1, struct node *n2) > +{ > + struct node temp =3D *n1; > + *n1 =3D *n2; > + *n2 =3D temp; > +} > + > +/* heapify function */ > +static void > +rte_member_heapify(struct minheap *hp, uint32_t idx, bool update_hash) > +{ > + uint32_t smallest; > + > + if (LCHILD(idx) < hp->size && > + hp->elem[LCHILD(idx)].count < hp->elem[idx].count) > + smallest =3D LCHILD(idx); > + else > + smallest =3D idx; > + > + if (RCHILD(idx) < hp->size && > + hp->elem[RCHILD(idx)].count < hp- > >elem[smallest].count) > + smallest =3D RCHILD(idx); > + > + if (smallest !=3D idx) { > + rte_member_heap_swap(&(hp->elem[idx]), &(hp- > >elem[smallest])); > + > +#ifdef MINHEAP_OPTIMIZED_HT > + if (update_hash) { > + if (hash_table_update(hp->elem[smallest].key, idx + 1, > smallest + 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table > update failed\n"); > + return; > + } > + > + if (hash_table_update(hp->elem[idx].key, smallest + 1, > idx + 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table > update failed\n"); > + return; > + } > + } > +#endif > + rte_member_heapify(hp, smallest, update_hash); > + } > +} > + > +/* insert a node into the minheap */ > +static int > +rte_member_minheap_insert_node(struct minheap *hp, const void *key, > + int counter, void *key_slot, > + struct rte_ring *free_key_slot) > +{ > + struct node nd; > + uint32_t slot_id; > + > + if (rte_ring_sc_dequeue_elem(free_key_slot, &slot_id, > sizeof(uint32_t)) !=3D 0) { > + RTE_MEMBER_LOG(ERR, "Minheap get empty keyslot failed\n"); > + return -1; > + } > + > + nd.count =3D counter; > + nd.key =3D RTE_PTR_ADD(key_slot, slot_id * hp->key_len); > + > + memcpy(nd.key, key, hp->key_len); > + > + uint32_t i =3D (hp->size)++; > + > + while (i && nd.count < hp->elem[PARENT(i)].count) { > + hp->elem[i] =3D hp->elem[PARENT(i)]; > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_update(hp->elem[i].key, PARENT(i) + 1, i + 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table update > failed\n"); > + return -1; > + } > +#endif > + i =3D PARENT(i); > + } > + hp->elem[i] =3D nd; > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_insert(key, i + 1, hp->key_len, hp->hashtable) < 0) { > + if (resize_hash_table(hp) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table resize > failed\n"); > + return -1; > + } > + } > +#endif > + return 0; > +} > + > +/* delete a key from the minheap */ > +static int > +rte_member_minheap_delete_node(struct minheap *hp, const void *key, > + void *key_slot, struct rte_ring *free_key_slot) > +{ > + int idx =3D rte_member_minheap_find(hp, key); > + uint32_t offset =3D RTE_PTR_DIFF(hp->elem[idx].key, key_slot) / hp- > >key_len; > + > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_del(key, idx + 1, hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table delete failed\n"); > + return -1; > + } > +#endif > + rte_ring_sp_enqueue_elem(free_key_slot, &offset, sizeof(uint32_t)); > + > + if (idx =3D=3D (int)(hp->size - 1)) { > + hp->size--; > + return 0; > + } > + > + hp->elem[idx] =3D hp->elem[hp->size - 1]; > + > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_update(hp->elem[idx].key, hp->size, idx + 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table update > failed\n"); > + return -1; > + } > +#endif > + hp->size--; > + rte_member_heapify(hp, idx, true); > + > + return 0; > +} > + > +/* replace a min node with a new key. */ > +static int > +rte_member_minheap_replace_node(struct minheap *hp, > + const void *new_key, > + int new_counter) > +{ > + struct node nd; > + void *recycle_key =3D NULL; > + > + recycle_key =3D hp->elem[0].key; > + > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_del(recycle_key, 1, hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table delete failed\n"); > + return -1; > + } > +#endif > + hp->elem[0] =3D hp->elem[hp->size - 1]; > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_update(hp->elem[0].key, hp->size, 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table update > failed\n"); > + return -1; > + } > +#endif > + hp->size--; > + > + rte_member_heapify(hp, 0, true); > + > + nd.count =3D new_counter; > + nd.key =3D recycle_key; > + > + memcpy(nd.key, new_key, hp->key_len); > + > + uint32_t i =3D (hp->size)++; > + > + while (i && nd.count < hp->elem[PARENT(i)].count) { > + hp->elem[i] =3D hp->elem[PARENT(i)]; > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_update(hp->elem[i].key, PARENT(i) + 1, i + 1, > + hp->key_len, hp->hashtable) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table update > failed\n"); > + return -1; > + } > +#endif > + i =3D PARENT(i); > + } > + > + hp->elem[i] =3D nd; > + > +#ifdef MINHEAP_OPTIMIZED_HT > + if (hash_table_insert(new_key, i + 1, hp->key_len, hp->hashtable) < 0) = { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table replace insert > failed\n"); > + if (resize_hash_table(hp) < 0) { > + RTE_MEMBER_LOG(ERR, "Minheap Hash Table replace > resize failed\n"); > + return -1; > + } > + } > +#endif > + return 0; > +} > + > +/* sort the heap into a decending array */ > +static void > +rte_member_heapsort(struct minheap *hp, struct node *result_array) > +{ > + struct minheap new_hp; > + > + /* build a new heap for using the given array */ > + new_hp.size =3D hp->size; > + new_hp.key_len =3D hp->key_len; > + new_hp.elem =3D result_array; > + memcpy(result_array, hp->elem, hp->size * sizeof(struct node)); > + > + /* sort the new heap */ > + while (new_hp.size > 1) { > + rte_member_heap_swap(&(new_hp.elem[0]), > &(new_hp.elem[new_hp.size - 1])); > + new_hp.size--; > + rte_member_heapify(&new_hp, 0, false); > + } > +} > + > +static void > +rte_member_minheap_free(struct minheap *hp) > +{ > + if (hp =3D=3D NULL) > + return; > + > + rte_free(hp->elem); > + rte_free(hp->hashtable); > +} > + > +static void > +rte_member_minheap_reset(struct minheap *hp) > +{ > + if (hp =3D=3D NULL) > + return; > + > + memset(hp->elem, 0, sizeof(struct node) * hp->size); > + hp->size =3D 0; > + > +#ifdef MINHEAP_OPTIMIZED_HT > + memset((char *)hp->hashtable + sizeof(struct hash), 0, > + hp->hashtable->bkt_cnt * sizeof(struct hash_bkt)); > + hp->hashtable->num_item =3D 0; > +#endif > +} > + > +#endif /* _RTE_MEMBER_HEAP_H_ */ > diff --git a/lib/member/rte_member_sketch.c > b/lib/member/rte_member_sketch.c > new file mode 100644 > index 0000000000..0c5a0bad99 > --- /dev/null > +++ b/lib/member/rte_member_sketch.c > @@ -0,0 +1,584 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + * Copyright(c) 2020, Alan Liu > + */ > + > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "rte_member.h" > +#include "rte_member_sketch.h" > +#include "rte_member_heap.h" > + > +#ifdef CC_AVX512_SUPPORT > +#include "rte_member_sketch_avx512.h" > +#endif /* CC_AVX512_SUPPORT */ > + > +struct sketch_runtime { > + uint64_t pkt_cnt; > + uint32_t until_next; > + int converged; > + struct minheap heap; > + struct node *report_array; > + void *key_slots; > + struct rte_ring *free_key_slots; > +} __rte_cache_aligned; > + > +static uint32_t > +draw_geometric(const struct rte_member_setsum *ss) > +{ > + double rand =3D 1; > + > + if (ss->sample_rate =3D=3D 1) > + return 1; > + > + while (rand =3D=3D 1 || rand =3D=3D 0) > + rand =3D (double) rte_rand() / (UINT64_MAX); > + > + return (uint32_t)ceil(log(1 - rand) / log(1 - ss->sample_rate)); > +} [Wang, Yipeng] Log might be an expensive operation. Is there alternative, o= r if it is called frequently? > + > +static void > +isort(uint64_t *array, int n) > +{ > + for (int i =3D 1; i < n; i++) { > + uint64_t t =3D array[i]; > + int j; > + > + for (j =3D i - 1; j >=3D 0; j--) { > + if (t < array[j]) > + array[j + 1] =3D array[j]; > + else > + break; > + } > + array[j + 1] =3D t; > + } > +} [Wang, Yipeng] Is insertion sort the fastest to sort less-than-10 count of = numbers? > + > +static __rte_always_inline void > +swap(uint64_t *a, uint64_t *b) > +{ > + uint64_t tmp =3D *a; > + *a =3D *b; > + *b =3D tmp; > +} > + > +static uint64_t > +medianof5(uint64_t a, uint64_t b, uint64_t c, uint64_t d, uint64_t e) > +{ > + if (a > b) > + swap(&a, &b); > + if (c > d) > + swap(&c, &d); > + if (a > c) { > + if (d > e) > + swap(&c, &e); > + else { > + swap(&c, &d); > + swap(&d, &e); > + } > + } else { > + if (b > e) > + swap(&a, &e); > + else { > + swap(&a, &b); > + swap(&b, &e); > + } > + } > + > + if (a > c) > + return a > d ? d : a; > + else > + return b > c ? c : b; > +} > + > +int > +rte_member_create_sketch(struct rte_member_setsum *ss, > + const struct rte_member_parameters *params, > + struct rte_ring *ring) > +{ > + struct sketch_runtime *runtime; > + uint32_t num_col; > + uint32_t i; > + > + if (params->sample_rate =3D=3D 0 || params->sample_rate > 1) { > + rte_errno =3D EINVAL; > + RTE_MEMBER_LOG(ERR, > + "Membership Sketch created with invalid > parameters\n"); > + return -EINVAL; > + } > + > + if (params->extra_flag & RTE_MEMBER_SKETCH_COUNT_BYTE) > + ss->count_byte =3D 1; > + > + if (ss->count_byte =3D=3D 1 && > + rte_vect_get_max_simd_bitwidth() >=3D RTE_VECT_SIMD_512 > && > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) =3D=3D 1 && > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512IFMA) =3D=3D 1) { > +#ifdef CC_AVX512_SUPPORT > + ss->use_avx512 =3D true; > +#else > + ss->use_avx512 =3D false; > +#endif > + } > + > + if (ss->use_avx512 =3D=3D true) { > + ss->num_row =3D NUM_ROW_VEC; > + RTE_MEMBER_LOG(NOTICE, > + "Membership Sketch AVX512 update/lookup/delete ops > is selected\n"); > + ss->sketch_update =3D sketch_update_avx512; > + ss->sketch_lookup =3D sketch_lookup_avx512; > + ss->sketch_delete =3D sketch_delete_avx512; > + } else { > + ss->num_row =3D NUM_ROW_SCALAR; > + RTE_MEMBER_LOG(NOTICE, > + "Membership Sketch SCALAR update/lookup/delete ops > is selected\n"); > + ss->sketch_update =3D sketch_update_scalar; > + ss->sketch_lookup =3D sketch_lookup_scalar; > + ss->sketch_delete =3D sketch_delete_scalar; > + } > + > + ss->socket_id =3D params->socket_id; > + > + if (ss->count_byte =3D=3D 0) > + num_col =3D 4.0 / params->error_rate / params->sample_rate; > + else if (ss->use_avx512 =3D=3D true) > + num_col =3D rte_align32pow2(4.0 / params->error_rate); > + else > + num_col =3D 4.0 / params->error_rate; [Wang, Yipeng] Could here be div/0 fault? A pointer to the formula, or comm= ent the formula inline would help people understand the constants. Similar for other cases where algorit= hm-specific constants are used. > + > + ss->table =3D rte_zmalloc_socket(NULL, > + sizeof(uint64_t) * num_col * ss->num_row, > + RTE_CACHE_LINE_SIZE, ss->socket_id); > + if (ss->table =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Table memory allocation > failed\n"); > + return -ENOMEM; > + } > + > + ss->hash_seeds =3D rte_zmalloc_socket(NULL, sizeof(uint64_t) * ss- > >num_row, > + RTE_CACHE_LINE_SIZE, ss->socket_id); > + if (ss->hash_seeds =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Hashseeds memory allocation > failed\n"); > + return -ENOMEM; > + } > + > + ss->runtime_var =3D rte_zmalloc_socket(NULL, sizeof(struct > sketch_runtime), > + RTE_CACHE_LINE_SIZE, ss->socket_id); > + if (ss->runtime_var =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Runtime memory allocation > failed\n"); > + rte_free(ss); > + return -ENOMEM; > + } > + runtime =3D ss->runtime_var; > + > + ss->num_col =3D num_col; > + ss->sample_rate =3D params->sample_rate; > + ss->prim_hash_seed =3D params->prim_hash_seed; > + ss->sec_hash_seed =3D params->sec_hash_seed; > + ss->error_rate =3D params->error_rate; > + ss->topk =3D params->top_k; > + ss->key_len =3D params->key_len; > + runtime->heap.key_len =3D ss->key_len; > + > + runtime->key_slots =3D rte_zmalloc_socket(NULL, ss->key_len * ss->topk, > + RTE_CACHE_LINE_SIZE, ss->socket_id); > + if (runtime->key_slots =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Key Slots allocation failed\n"); > + goto error; > + } > + > + runtime->free_key_slots =3D ring; > + for (i =3D 0; i < ss->topk; i++) > + rte_ring_sp_enqueue_elem(runtime->free_key_slots, > + &i, sizeof(uint32_t)); > + > + if (rte_member_minheap_init(&(runtime->heap), params->top_k, > + ss->socket_id, params->prim_hash_seed) < 0) { > + RTE_MEMBER_LOG(ERR, "Sketch Minheap allocation failed\n"); > + goto error_runtime; > + } > + > + runtime->report_array =3D rte_zmalloc_socket(NULL, sizeof(struct node) = * > ss->topk, > + RTE_CACHE_LINE_SIZE, ss->socket_id); > + if (runtime->report_array =3D=3D NULL) { > + RTE_MEMBER_LOG(ERR, "Sketch Runtime Report Array > allocation failed\n"); > + goto error_runtime; > + } > + > + rte_srand(ss->prim_hash_seed); > + for (uint32_t i =3D 0; i < ss->num_row; i++) > + ss->hash_seeds[i] =3D rte_rand(); > + > + if (params->extra_flag & RTE_MEMBER_SKETCH_ALWAYS_BOUNDED) > + ss->always_bounded =3D 1; > + > + if (ss->always_bounded) { > + double delta =3D 1.0 / (pow(2, ss->num_row)); > + > + ss->converge_thresh =3D 10 * pow(ss->error_rate, -2.0) * > sqrt(log(1/delta)); > + } > + > + RTE_MEMBER_LOG(DEBUG, "Sketch created, " > + "the total memory required is %u Bytes\n", ss->num_col * ss- > >num_row * 8); > + > + return 0; > + > +error_runtime: > + rte_member_minheap_free(&runtime->heap); > + rte_ring_free(runtime->free_key_slots); > + rte_free(runtime->key_slots); > +error: > + rte_free(runtime); > + rte_free(ss); > + > + return -ENOMEM; > +} > + > +uint64_t > +sketch_lookup_scalar(const struct rte_member_setsum *ss, const void *key= ) > +{ > + uint64_t *count_array =3D ss->table; > + uint32_t col[ss->num_row]; > + uint64_t count_row[ss->num_row]; > + uint32_t cur_row; > + uint64_t count; > + > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) { > + col[cur_row] =3D MEMBER_HASH_FUNC(key, ss->key_len, > + ss->hash_seeds[cur_row]) % ss->num_col; > + > + rte_prefetch0(&count_array[cur_row * ss->num_col + > col[cur_row]]); > + } > + > + /* if sample rate is 1, it is a regular count-min, we report the min */ > + if (ss->sample_rate =3D=3D 1 || ss->count_byte =3D=3D 1) > + return count_min(ss, col); > + > + /* otherwise we report the median number */ > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) > + count_row[cur_row] =3D count_array[cur_row * ss->num_col + > col[cur_row]]; > + > + if (ss->num_row =3D=3D 5) > + return medianof5(count_row[0], count_row[1], > + count_row[2], count_row[3], count_row[4]); > + > + isort(count_row, ss->num_row); > + > + if (ss->num_row % 2 =3D=3D 0) { > + count =3D (count_row[ss->num_row / 2] + count_row[ss- > >num_row / 2 - 1]) / 2; > + return count; > + } > + /* ss->num_row % 2 !=3D 0 */ > + count =3D count_row[ss->num_row / 2]; > + > + return count; > +} > + > +void > +sketch_delete_scalar(const struct rte_member_setsum *ss, const void *key= ) > +{ > + uint32_t col[ss->num_row]; > + uint64_t *count_array =3D ss->table; > + uint64_t min =3D UINT64_MAX; > + uint32_t cur_row; > + > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) { > + col[cur_row] =3D MEMBER_HASH_FUNC(key, ss->key_len, > + ss->hash_seeds[cur_row]) % ss->num_col; > + > + rte_prefetch0(&count_array[cur_row * ss->num_col + > col[cur_row]]); > + } > + > + /* if sample rate is 1, it is a regular count-min, we report the min */ [Wang, Yipeng] For count-min, key could be deleted, but If it is not the co= unt-min, does it still support delete function? When delete key, do you also need to delete from the heap? > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) { > + uint64_t cnt =3D count_array[cur_row * ss->num_col + > col[cur_row]]; > + > + if (cnt < min) > + min =3D cnt; > + } > + > + /* subtract the min value from all the counters */ > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) > + count_array[cur_row * ss->num_col + col[cur_row]] -=3D min; > +} > + > +int > +rte_member_query_sketch(const struct rte_member_setsum *ss, > + const void *key, > + uint64_t *output) > +{ > + uint64_t count =3D ss->sketch_lookup(ss, key); > + *output =3D count; > + > + return 0; > +} > + > +void > +rte_member_update_heap(const struct rte_member_setsum *ss) > +{ > + uint32_t i; > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + > + for (i =3D 0; i < runtime_var->heap.size; i++) { > + uint64_t count =3D ss->sketch_lookup(ss, runtime_var- > >heap.elem[i].key); > + > + runtime_var->heap.elem[i].count =3D count; > + } > +} > + > +int > +rte_member_report_heavyhitter_sketch(const struct rte_member_setsum > *setsum, > + void **key, > + uint64_t *count) > +{ > + uint32_t i; > + struct sketch_runtime *runtime_var =3D setsum->runtime_var; > + > + rte_member_update_heap(setsum); > + rte_member_heapsort(&(runtime_var->heap), runtime_var- > >report_array); > + > + for (i =3D 0; i < runtime_var->heap.size; i++) { > + key[i] =3D runtime_var->report_array[i].key; > + count[i] =3D runtime_var->report_array[i].count; > + } > + > + return runtime_var->heap.size; > +} > + > +int > +rte_member_lookup_sketch(const struct rte_member_setsum *ss, > + const void *key, member_set_t *set_id) > +{ > + uint64_t count =3D ss->sketch_lookup(ss, key); > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + > + if (runtime_var->heap.size > 0 && count >=3D runtime_var- > >heap.elem[0].count) > + *set_id =3D 1; > + else > + *set_id =3D 0; > + > + if (count =3D=3D 0) > + return 0; > + else > + return 1; > +} > + > +static void > +should_converge(const struct rte_member_setsum *ss) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + > + /* For count min sketch - L1 norm */ [Wang, Yipeng] When sample_rate is not 1, i.e. it is not a count-min, does = it still apply? > + if (runtime_var->pkt_cnt > ss->converge_thresh) { > + runtime_var->converged =3D 1; > + RTE_MEMBER_LOG(DEBUG, "Sketch converged, begin sampling > " > + "from key count %lu\n", > + runtime_var->pkt_cnt); > + } > +} > + > +static void > +sketch_update_row(const struct rte_member_setsum *ss, const void *key, > + uint32_t count, uint32_t cur_row) > +{ > + uint64_t *count_array =3D ss->table; > + uint32_t col =3D MEMBER_HASH_FUNC(key, ss->key_len, > + ss->hash_seeds[cur_row]) % ss->num_col; > + > + /* sketch counter update */ > + count_array[cur_row * ss->num_col + col] +=3D > + ceil(count / (ss->sample_rate)); > +} > + > +void > +sketch_update_scalar(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t count) > +{ > + uint64_t *count_array =3D ss->table; > + uint32_t col; > + uint32_t cur_row; > + > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) { > + col =3D MEMBER_HASH_FUNC(key, ss->key_len, > + ss->hash_seeds[cur_row]) % ss->num_col; > + count_array[cur_row * ss->num_col + col] +=3D count; > + } > +} > + > +static void > +heap_update(const struct rte_member_setsum *ss, const void *key) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + uint64_t key_cnt =3D 0; > + int found; > + > + /* We also update the heap for this key */ > + key_cnt =3D ss->sketch_lookup(ss, key); > + if (key_cnt > runtime_var->heap.elem[0].count) { > + found =3D rte_member_minheap_find(&runtime_var->heap, key); > + /* the key is found in the top-k heap */ > + if (found >=3D 0) { > + if (runtime_var->heap.elem[found].count < key_cnt) > + rte_member_heapify(&runtime_var->heap, > found, true); > + > + runtime_var->heap.elem[found].count =3D key_cnt; > + } else if (runtime_var->heap.size < ss->topk) { > + rte_member_minheap_insert_node(&runtime_var- > >heap, key, > + key_cnt, runtime_var->key_slots, runtime_var- > >free_key_slots); > + } else { > + rte_member_minheap_replace_node(&runtime_var- > >heap, key, key_cnt); > + } > + } else if (runtime_var->heap.size < ss->topk) { > + found =3D rte_member_minheap_find(&runtime_var->heap, key); > + if (found >=3D 0) { > + if (runtime_var->heap.elem[found].count < key_cnt) > + rte_member_heapify(&runtime_var->heap, > found, true); > + > + runtime_var->heap.elem[found].count =3D key_cnt; > + } else > + rte_member_minheap_insert_node(&runtime_var- > >heap, key, > + key_cnt, runtime_var->key_slots, runtime_var- > >free_key_slots); > + } > +} > + > +/* > + * Add a single packet into the sketch. > + * Sketch value is meatured by packet numbers in this mode. > + */ > +int > +rte_member_add_sketch(const struct rte_member_setsum *ss, > + const void *key, > + __rte_unused member_set_t set_id) > +{ > + uint32_t cur_row; > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + uint32_t *until_next =3D &(runtime_var->until_next); > + > + /* > + * If sketch is mesured by byte count, > + * the rte_member_add_sketch_byte_count routine should be used. > + */ > + if (ss->count_byte =3D=3D 1) > + return -EINVAL; [Wang, Yipeng] Maybe a message to user on the reason of error. > + > + if (ss->sample_rate =3D=3D 1) { > + ss->sketch_update(ss, key, 1); > + heap_update(ss, key); > + return 0; > + } > + > + /* convergence stage if it's needed */ > + if (ss->always_bounded && !runtime_var->converged) { > + ss->sketch_update(ss, key, 1); > + > + if (!((++runtime_var->pkt_cnt) & (INTERVAL - 1))) > + should_converge(ss); > + > + heap_update(ss, key); > + return 0; > + } > + > + /* should we skip this packet */ > + if (*until_next >=3D ss->num_row) { > + *until_next -=3D ss->num_row; > + return 0; > + } > + cur_row =3D *until_next; > + do { > + sketch_update_row(ss, key, 1, cur_row); > + *until_next =3D draw_geometric(ss); > + if (cur_row + *until_next >=3D ss->num_row) > + break; > + cur_row +=3D *until_next; > + } while (1); > + > + *until_next -=3D (ss->num_row - cur_row); > + > + heap_update(ss, key); > + > + return 0; > +} > + > +/* > + * Add the byte count of the packet into the sketch. > + * Sketch value is meatured by byte count numbers in this mode. > + */ > +int > +rte_member_add_sketch_byte_count(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t byte_count) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + uint32_t *until_next =3D &(runtime_var->until_next); > + > + /* should not call this API if not in count byte mode */ > + if (ss->count_byte =3D=3D 0) > + return -EINVAL; [Wang, Yipeng] Maybe a message to user on the reason of error. > + > + /* there's specific optimization for the sketch update */ > + ss->sketch_update(ss, key, byte_count); > + > + if (*until_next !=3D 0) { > + *until_next =3D *until_next - 1; > + return 0; > + } > + > + *until_next =3D draw_geometric(ss) - 1; > + > + heap_update(ss, key); > + > + return 0; > +} > + > +int > +rte_member_delete_sketch(const struct rte_member_setsum *ss, > + const void *key) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + int found; > + > + found =3D rte_member_minheap_find(&runtime_var->heap, key); > + if (found < 0) > + return -1; > + > + ss->sketch_delete(ss, key); > + > + return rte_member_minheap_delete_node > + (&runtime_var->heap, key, runtime_var->key_slots, > runtime_var->free_key_slots); > +} > + > +void > +rte_member_free_sketch(struct rte_member_setsum *ss) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + > + rte_free(ss->table); > + rte_member_minheap_free(&runtime_var->heap); > + rte_free(runtime_var->key_slots); > + rte_ring_free(runtime_var->free_key_slots); > + rte_free(runtime_var); > +} > + > +void > +rte_member_reset_sketch(const struct rte_member_setsum *ss) > +{ > + struct sketch_runtime *runtime_var =3D ss->runtime_var; > + uint64_t *sketch =3D ss->table; > + uint32_t i; > + > + memset(sketch, 0, sizeof(uint64_t) * ss->num_col * ss->num_row); > + rte_member_minheap_reset(&runtime_var->heap); > + rte_ring_reset(runtime_var->free_key_slots); > + > + for (i =3D 0; i < ss->topk; i++) > + rte_ring_sp_enqueue_elem(runtime_var->free_key_slots, &i, > sizeof(uint32_t)); > +} > diff --git a/lib/member/rte_member_sketch.h > b/lib/member/rte_member_sketch.h > new file mode 100644 > index 0000000000..a5e633a74e > --- /dev/null > +++ b/lib/member/rte_member_sketch.h > @@ -0,0 +1,96 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2017 Intel Corporation > + */ > + > +#ifndef _RTE_MEMBER_SKETCH_H_ > +#define _RTE_MEMBER_SKETCH_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include > + > +#define NUM_ROW_SCALAR 5 > +#define INTERVAL (1 << 15) > + > +#if !RTE_IS_POWER_OF_2(INTERVAL) > +#error sketch INTERVAL macro must be a power of 2 > +#endif > + > +int > +rte_member_create_sketch(struct rte_member_setsum *ss, > + const struct rte_member_parameters *params, > + struct rte_ring *r); > + > +int > +rte_member_lookup_sketch(const struct rte_member_setsum *setsum, > + const void *key, member_set_t *set_id); > + > +int > +rte_member_add_sketch(const struct rte_member_setsum *setsum, > + const void *key, > + member_set_t set_id); > + > +int > +rte_member_add_sketch_byte_count(const struct rte_member_setsum *ss, > + const void *key, uint32_t byte_count); > + > +void > +sketch_update_scalar(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t count); > + > +uint64_t > +sketch_lookup_scalar(const struct rte_member_setsum *ss, > + const void *key); > + > +void > +sketch_delete_scalar(const struct rte_member_setsum *ss, > + const void *key); > + > +int > +rte_member_delete_sketch(const struct rte_member_setsum *setsum, > + const void *key); > + > +int > +rte_member_query_sketch(const struct rte_member_setsum *setsum, > + const void *key, uint64_t *output); > + > +void > +rte_member_free_sketch(struct rte_member_setsum *ss); > + > +void > +rte_member_reset_sketch(const struct rte_member_setsum *setsum); > + > +int > +rte_member_report_heavyhitter_sketch(const struct rte_member_setsum > *setsum, > + void **key, uint64_t *count); > + > +void > +rte_member_update_heap(const struct rte_member_setsum *ss); > + > +static __rte_always_inline uint64_t > +count_min(const struct rte_member_setsum *ss, const uint32_t *hash_resul= ts) > +{ > + uint64_t *count_array =3D ss->table; > + uint64_t count; > + uint32_t cur_row; > + uint64_t min =3D UINT64_MAX; > + > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) { > + uint64_t cnt =3D count_array[cur_row * ss->num_col + > hash_results[cur_row]]; > + > + if (cnt < min) > + min =3D cnt; > + } > + count =3D min; > + > + return count; > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_MEMBER_SKETCH_H_ */ > diff --git a/lib/member/rte_member_sketch_avx512.c > b/lib/member/rte_member_sketch_avx512.c > new file mode 100644 > index 0000000000..c83f4b6fd1 > --- /dev/null > +++ b/lib/member/rte_member_sketch_avx512.c > @@ -0,0 +1,69 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#include "rte_member_sketch_avx512.h" > + > +__rte_always_inline void > +sketch_update_avx512(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t count) > +{ > + uint64_t *count_array =3D ss->table; > + uint32_t num_col =3D ss->num_col; > + uint32_t key_len =3D ss->key_len; > + __m256i v_row_base; > + __m256i v_hash_result; > + __m512i current_sketch; > + __m512i updated_sketch; > + __m512i v_count; > + > + const __m256i v_idx =3D _mm256_set_epi32(7, 6, 5, 4, 3, 2, 1, 0); > + const __m256i v_col =3D _mm256_set1_epi32(num_col); > + > + /* compute the hash result parallelly */ > + v_hash_result =3D rte_xxh64_sketch_avx512 > + (key, key_len, *(__m512i *)ss->hash_seeds, num_col); > + v_row_base =3D _mm256_mullo_epi32(v_idx, v_col); > + v_hash_result =3D _mm256_add_epi32(v_row_base, v_hash_result); > + > + current_sketch =3D > + _mm512_i32gather_epi64(v_hash_result, count_array, 8); > + v_count =3D _mm512_set1_epi64(count); > + updated_sketch =3D _mm512_add_epi64(current_sketch, v_count); > + _mm512_i32scatter_epi64 > + ((void *)count_array, v_hash_result, updated_sketch, 8); > +} > + > +uint64_t > +sketch_lookup_avx512(const struct rte_member_setsum *ss, const void *key= ) > +{ > + uint32_t col[ss->num_row]; > + > + /* currently only for sketch byte count mode */ > + __m256i v_hash_result =3D rte_xxh64_sketch_avx512 > + (key, ss->key_len, *(__m512i *)ss->hash_seeds, ss->num_col); > + _mm256_storeu_si256((__m256i *)col, v_hash_result); > + > + return count_min(ss, col); > +} > + > +void > +sketch_delete_avx512(const struct rte_member_setsum *ss, const void *key= ) > +{ > + uint32_t col[ss->num_row]; > + uint64_t *count_array =3D ss->table; > + uint64_t min =3D UINT64_MAX; > + uint32_t cur_row; > + > + __m256i v_hash_result =3D rte_xxh64_sketch_avx512 > + (key, ss->key_len, *(__m512i *)ss->hash_seeds, > + RTE_ALIGN_FLOOR(ss->num_col, 32)); > + _mm256_storeu_si256((__m256i *)col, v_hash_result); > + > + min =3D count_min(ss, col); > + > + /* subtract the min value from all the counters */ > + for (cur_row =3D 0; cur_row < ss->num_row; cur_row++) > + count_array[cur_row * ss->num_col + col[cur_row]] -=3D min; > +} > diff --git a/lib/member/rte_member_sketch_avx512.h > b/lib/member/rte_member_sketch_avx512.h > new file mode 100644 > index 0000000000..e7c25da643 > --- /dev/null > +++ b/lib/member/rte_member_sketch_avx512.h > @@ -0,0 +1,36 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _RTE_MEMBER_SKETCH_AVX512_H_ > +#define _RTE_MEMBER_SKETCH_AVX512_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include > +#include "rte_member.h" > +#include "rte_member_sketch.h" > +#include "rte_xxh64_avx512.h" > + > +#define NUM_ROW_VEC 8 > + > +void > +sketch_update_avx512(const struct rte_member_setsum *ss, > + const void *key, > + uint32_t count); > + > +uint64_t > +sketch_lookup_avx512(const struct rte_member_setsum *ss, > + const void *key); > + > +void > +sketch_delete_avx512(const struct rte_member_setsum *ss, > + const void *key); > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_MEMBER_SKETCH_AVX512_H_ */ > diff --git a/lib/member/rte_xxh64_avx512.h b/lib/member/rte_xxh64_avx512.= h > new file mode 100644 > index 0000000000..574748fc38 > --- /dev/null > +++ b/lib/member/rte_xxh64_avx512.h > @@ -0,0 +1,117 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _RTE_XXH64_AVX512_H_ > +#define _RTE_XXH64_AVX512_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include > +#include > + > +/* > 0b100111100011011101111001101100011000010111101011110010101000011 > 1 */ > +static const uint64_t PRIME64_1 =3D 0x9E3779B185EBCA87ULL; > +/* > 0b110000101011001010101110001111010010011111010100111010110100111 > 1 */ > +static const uint64_t PRIME64_2 =3D 0xC2B2AE3D27D4EB4FULL; > +/* > 0b000101100101011001100111101100011001111000110111011110011111100 > 1 */ > +static const uint64_t PRIME64_3 =3D 0x165667B19E3779F9ULL; > +/* > 0b100001011110101111001010011101111100001010110010101011100110001 > 1 */ > +static const uint64_t PRIME64_4 =3D 0x85EBCA77C2B2AE63ULL; > +/* > 0b001001111101010011101011001011110001011001010110011001111100010 > 1 */ > +static const uint64_t PRIME64_5 =3D 0x27D4EB2F165667C5ULL; > + > +static __rte_always_inline __m512i > +xxh64_round_avx512(__m512i hash, __m512i input) > +{ > + hash =3D _mm512_madd52lo_epu64(hash, > + input, > + _mm512_set1_epi64(PRIME64_2)); > + > + hash =3D _mm512_rol_epi64(hash, 31); > + > + return hash; > +} > + > +static __rte_always_inline __m512i > +xxh64_fmix_avx512(__m512i hash) > +{ > + hash =3D _mm512_xor_si512(hash, _mm512_srli_epi64(hash, 33)); > + > + return hash; > +} > + > +static __rte_always_inline __m256i > +rte_xxh64_sketch_avx512(const void *key, uint32_t key_len, > + __m512i v_seed, uint32_t modulo) > +{ > + __m512i v_prime64_5, v_hash; > + size_t remaining =3D key_len; > + size_t offset =3D 0; > + __m512i input; > + > + v_prime64_5 =3D _mm512_set1_epi64(PRIME64_5); > + v_hash =3D _mm512_add_epi64 > + (_mm512_add_epi64(v_seed, v_prime64_5), > + _mm512_set1_epi64(key_len)); > + > + while (remaining >=3D 8) { > + input =3D _mm512_set1_epi64(*(uint64_t *)RTE_PTR_ADD(key, > offset)); > + v_hash =3D _mm512_xor_epi64(v_hash, > + xxh64_round_avx512(_mm512_setzero_si512(), > input)); > + v_hash =3D > _mm512_madd52lo_epu64(_mm512_set1_epi64(PRIME64_4), > + _mm512_rol_epi64(v_hash, 27), > + _mm512_set1_epi64(PRIME64_1)); > + > + remaining -=3D 8; > + offset +=3D 8; > + } > + > + if (remaining >=3D 4) { > + input =3D _mm512_set1_epi64 > + (*(uint32_t *)RTE_PTR_ADD(key, offset)); > + v_hash =3D _mm512_xor_epi64(v_hash, > + _mm512_mullo_epi64(input, > + _mm512_set1_epi64(PRIME64_1))); > + v_hash =3D _mm512_madd52lo_epu64 > + (_mm512_set1_epi64(PRIME64_3), > + _mm512_rol_epi64(v_hash, 23), > + _mm512_set1_epi64(PRIME64_2)); > + > + offset +=3D 4; > + remaining -=3D 4; > + } > + > + while (remaining !=3D 0) { > + input =3D _mm512_set1_epi64 > + (*(uint8_t *)RTE_PTR_ADD(key, offset)); > + v_hash =3D _mm512_xor_epi64(v_hash, > + _mm512_mullo_epi64(input, > + _mm512_set1_epi64(PRIME64_5))); > + v_hash =3D _mm512_mullo_epi64 > + (_mm512_rol_epi64(v_hash, 11), > + _mm512_set1_epi64(PRIME64_1)); > + offset++; > + remaining--; > + } > + > + v_hash =3D xxh64_fmix_avx512(v_hash); > + > + /* > + * theoritically, such modular operations can be replaced by > + * _mm512_rem_epi64(), but seems it depends on the complier's > + * implementation. so here is the limitation that the modulo > + * value should be power of 2. > + */ > + __m512i v_hash_remainder =3D _mm512_set1_epi64((modulo - 1)); > + > + return _mm512_cvtepi64_epi32(_mm512_and_si512(v_hash, > v_hash_remainder)); > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_XXH64_AVX512_H_ */ > -- > 2.25.1 [Wang, Yipeng]=20 Leyi, thanks for drafting the code and posting the RFC. As I may understand= sketch algorithms better than some of the other audients here, it would be helpful if you could add some use cases explaining the contribu= tion (e.g. some real usages, details on why this algorithm is needed). So that others could evaluate and comment on the patch set better. Algorithm-wise, my major question is: 1. Is deletion supported for non-count-min case? 2. Byte-count estimation mode was not originally from the paper (as I remem= ber), any caveat to use this mode? Alan may have better idea. Code generally looks OK, but you may want to add comments for people to rev= iew more easily. Without knowledge of the algorithm, it is hard to understa= nd many of the function usages. Several modes are also involved (e.g. byte-count mode, always-bounded mode,= etc.) May be beneficial to know which is most useful and set it to default= . Also explain them well to users and reviewers.