From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 770B91B452 for ; Thu, 27 Sep 2018 14:33:06 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2018 05:33:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,310,1534834800"; d="scan'208";a="87098421" Received: from irsmsx105.ger.corp.intel.com ([163.33.3.28]) by orsmga003.jf.intel.com with ESMTP; 27 Sep 2018 05:33:01 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.45]) by irsmsx105.ger.corp.intel.com ([169.254.7.129]) with mapi id 14.03.0319.002; Thu, 27 Sep 2018 13:33:01 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" CC: Honnappa Nagarahalli , "Wang, Yipeng1" , "dev@dpdk.org" , "michel@digirati.com.br" Thread-Topic: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature Thread-Index: AQHUVhn5ayN3h56Hlkqq7R/uBoG23aUD6bMAgAAT8cCAAABuAIAAEgEw Date: Thu, 27 Sep 2018 12:33:00 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580102FDE901@IRSMSX106.ger.corp.intel.com> References: <1536253745-133104-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> <20180927111501.GC19604@bricha3-MOBL.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580102FDE80F@IRSMSX106.ger.corp.intel.com> <20180927122756.GA776@bricha3-MOBL.ger.corp.intel.com> In-Reply-To: <20180927122756.GA776@bricha3-MOBL.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZjc4MTFjNjItMzg4Yi00MWUzLThmZGItODk3ZTZkM2I0NzAwIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiTFA1dGpJdmR6ekVcL0o0S2g0WmFKb3ZcL3BZQVwvSlBzbVVEY0h5NU9kSHRyVjFlQXBIMVR1SHJyNzIzc2NoYmQyQiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 12:33:07 -0000 > -----Original Message----- > From: Richardson, Bruce > Sent: Thursday, September 27, 2018 1:28 PM > To: Ananyev, Konstantin > Cc: Honnappa Nagarahalli ; Wang, Yipeng1 ; dev@dpdk.org; > michel@digirati.com.br > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket featur= e >=20 > On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote: > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > > Sent: Thursday, September 27, 2018 12:15 PM > > > To: Honnappa Nagarahalli > > > Cc: Wang, Yipeng1 ; dev@dpdk.org; michel@digi= rati.com.br > > > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket fe= ature > > > > > > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yipeng Wang > > > > > Sent: Friday, September 21, 2018 12:18 PM > > > > > To: bruce.richardson@intel.com > > > > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br; > > > > > Honnappa Nagarahalli > > > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > > > > > > > > > In use cases that hash table capacity needs to be guaranteed, the= extendable > > > > > bucket feature can be used to contain extra keys in linked lists = when conflict > > > > > happens. This is similar concept to the extendable bucket hash ta= ble in packet > > > > > framework. > > > > > > > > > > This commit adds the extendable bucket feature. User can turn it = on or off > > > > > through the extra flag field during table creation time. > > > > > > > > > > Extendable bucket table composes of buckets that can be linked li= st to current > > > > > main table. When extendable bucket is enabled, the table utilizat= ion can > > > > > always acheive 100%. > > > > IMO, referring to this as 'table utilization' indicates an efficien= cy about memory utilization. Please consider changing this to indicate > that > > > all of the configured number of entries will be accommodated? > > > > > > > > > Although keys ending up in the ext buckets may have longer look u= p time, they > > > > > should be rare due to the cuckoo algorithm. > > > > > > > > > > Signed-off-by: Yipeng Wang > > > > > --- > > > > > lib/librte_hash/rte_cuckoo_hash.c | 326 > > > > > +++++++++++++++++++++++++++++++++----- > > > > > lib/librte_hash/rte_cuckoo_hash.h | 5 + > > > > > lib/librte_hash/rte_hash.h | 3 + > > > > > 3 files changed, 292 insertions(+), 42 deletions(-) > > > > > > > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > > > > > b/lib/librte_hash/rte_cuckoo_hash.c > > > > > index f7b86c8..616900b 100644 > > > > > --- a/lib/librte_hash/rte_cuckoo_hash.c > > > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c > > > > > @@ -31,6 +31,10 @@ > > > > > #include "rte_hash.h" > > > > > #include "rte_cuckoo_hash.h" > > > > > > > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > > > > > \ > > > > > +for (CURRENT_BKT =3D START_BUCKET; = \ > > > > > +CURRENT_BKT !=3D NULL; = \ > > > > > +CURRENT_BKT =3D CURRENT_BKT->next) > > > > > > > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > > > > > > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > > > > > return h; > > > > > } > > > > > > > > > > +static inline struct rte_hash_bucket * > > > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > > > > > +while (lst_bkt->next !=3D NULL) > > > > > +lst_bkt =3D lst_bkt->next; > > > > > +return lst_bkt; > > > > > +} > > > > > + > > > > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t= func) { > > > > > h->cmp_jump_table_idx =3D KEY_CUSTOM; > > > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_paramet= ers > > > > > *params) > > > > > struct rte_tailq_entry *te =3D NULL; > > > > > struct rte_hash_list *hash_list; > > > > > struct rte_ring *r =3D NULL; > > > > > +struct rte_ring *r_ext =3D NULL; > > > > > char hash_name[RTE_HASH_NAMESIZE]; > > > > > void *k =3D NULL; > > > > > void *buckets =3D NULL; > > > > > +void *buckets_ext =3D NULL; > > > > > char ring_name[RTE_RING_NAMESIZE]; > > > > > +char ext_ring_name[RTE_RING_NAMESIZE]; > > > > > unsigned num_key_slots; > > > > > unsigned i; > > > > > unsigned int hw_trans_mem_support =3D 0, multi_writer_support = =3D 0; > > > > > +unsigned int ext_table_support =3D 0; > > > > > unsigned int readwrite_concur_support =3D 0; > > > > > > > > > > rte_hash_function default_hash_func =3D (rte_hash_function)rte_j= hash; > > > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_paramet= ers > > > > > *params) > > > > > multi_writer_support =3D 1; > > > > > } > > > > > > > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > > > > > +ext_table_support =3D 1; > > > > > + > > > > > /* Store all keys and leave the first entry as a dummy entry for > > > > > lookup_bulk */ > > > > > if (multi_writer_support) > > > > > /* > > > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parame= ters > > > > > *params) > > > > > goto err; > > > > > } > > > > > > > > > > +const uint32_t num_buckets =3D rte_align32pow2(params->entries) = / > > > > > +RTE_HASH_BUCKET_ENTRIES; > > > > > + > > > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > > > > > +params- > > > > > >name); > > > > Can be inside the if statement below. > > > > > > > > > +/* Create ring for extendable buckets. */ > > > > > +if (ext_table_support) { > > > > > +r_ext =3D rte_ring_create(ext_ring_name, > > > > > +rte_align32pow2(num_buckets + 1), > > > > > +params->socket_id, 0); > > > > > + > > > > > +if (r_ext =3D=3D NULL) { > > > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > > > > > " > > > > > +"failed\n"); > > > > > +goto err; > > > > > +} > > > > > +} > > > > > + > > > > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > > > > > > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_param= eters > > > > > *params) > > > > > goto err_unlock; > > > > > } > > > > > > > > > > -const uint32_t num_buckets =3D rte_align32pow2(params->entries) > > > > > -/ RTE_HASH_BUCKET_ENTRIES; > > > > > - > > > > > buckets =3D rte_zmalloc_socket(NULL, > > > > > num_buckets * sizeof(struct rte_hash_bucket), > > > > > RTE_CACHE_LINE_SIZE, params->socket_id); > > > > > > > > > > if (buckets =3D=3D NULL) { > > > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > > > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > > > > > goto err_unlock; > > > > > } > > > > > > > > > > +/* Allocate same number of extendable buckets */ > > > > IMO, we are allocating too much memory to support this feature. Esp= ecially, when we claim that keys ending up in the extendable > table is > > > a rare occurrence. By doubling the memory we are effectively saying t= hat the main table might have 50% utilization. It will also > significantly > > > increase the cycles required to iterate the complete hash table (in r= te_hash_iterate API) even when we expect that the extendable table > > > contains very few entries. > > > > > > > > I am wondering if we can provide options to control the amount of e= xtra memory that gets allocated and make the memory allocation > > > dynamic (or on demand basis). I think this also goes well with the ge= neral direction DPDK is taking - allocate resources as needed rather > than > > > allocating all the resources during initialization. > > > > > > > > > > Given that adding new entries should not normally be a fast-path func= tion, > > > > Umm, I don't think I agree with that. > > There are plenty of cases where add/delete speed is important. > > Konstantin >=20 > True, I suppose. > Perhaps then the best approach is to give a couple of options to the > developer. Allow specifying an initial amount of memory, and then an opti= on > to allow the memory to grow or not. If the cost of memory allocation is a > problem for a specific app, then they can provide a large default and not > allow growing, while other apps, for whom speed is not that critical, can > provide a small default and allow growing. >=20 > Does that seem reasonable? Yes, I think it does. Konstantin