From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id E29621B3B3 for ; Thu, 27 Sep 2018 13:27:27 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2018 04:27:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,310,1534834800"; d="scan'208";a="260730958" Received: from irsmsx105.ger.corp.intel.com ([163.33.3.28]) by orsmga005.jf.intel.com with ESMTP; 27 Sep 2018 04:27:22 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.45]) by irsmsx105.ger.corp.intel.com ([169.254.7.129]) with mapi id 14.03.0319.002; Thu, 27 Sep 2018 12:27:21 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" , Honnappa Nagarahalli CC: "Wang, Yipeng1" , "dev@dpdk.org" , "michel@digirati.com.br" Thread-Topic: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature Thread-Index: AQHUVhn5ayN3h56Hlkqq7R/uBoG23aUD6bMAgAAT8cA= Date: Thu, 27 Sep 2018 11:27:21 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580102FDE80F@IRSMSX106.ger.corp.intel.com> References: <1536253745-133104-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> <20180927111501.GC19604@bricha3-MOBL.ger.corp.intel.com> In-Reply-To: <20180927111501.GC19604@bricha3-MOBL.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjlmMGYyNjUtZjZmMi00N2VkLTg1Y2EtZDk2ODMwNDEyZjkyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiVFdZZDBHWkhaMTcrK1ZkeGN6VTZSOGFKaE1TRGZNMzQyam1hTG9za0t3MTBTeUhMNU9RenBGKzh1RFY0SDNobiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 11:27:28 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > Sent: Thursday, September 27, 2018 12:15 PM > To: Honnappa Nagarahalli > Cc: Wang, Yipeng1 ; dev@dpdk.org; michel@digirati= .com.br > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket featur= e >=20 > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote: > > > > > > > -----Original Message----- > > > From: Yipeng Wang > > > Sent: Friday, September 21, 2018 12:18 PM > > > To: bruce.richardson@intel.com > > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br; > > > Honnappa Nagarahalli > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > > > > > In use cases that hash table capacity needs to be guaranteed, the ext= endable > > > bucket feature can be used to contain extra keys in linked lists when= conflict > > > happens. This is similar concept to the extendable bucket hash table = in packet > > > framework. > > > > > > This commit adds the extendable bucket feature. User can turn it on o= r off > > > through the extra flag field during table creation time. > > > > > > Extendable bucket table composes of buckets that can be linked list t= o current > > > main table. When extendable bucket is enabled, the table utilization = can > > > always acheive 100%. > > IMO, referring to this as 'table utilization' indicates an efficiency a= bout memory utilization. Please consider changing this to indicate that > all of the configured number of entries will be accommodated? > > > > > Although keys ending up in the ext buckets may have longer look up ti= me, they > > > should be rare due to the cuckoo algorithm. > > > > > > Signed-off-by: Yipeng Wang > > > --- > > > lib/librte_hash/rte_cuckoo_hash.c | 326 > > > +++++++++++++++++++++++++++++++++----- > > > lib/librte_hash/rte_cuckoo_hash.h | 5 + > > > lib/librte_hash/rte_hash.h | 3 + > > > 3 files changed, 292 insertions(+), 42 deletions(-) > > > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > > > b/lib/librte_hash/rte_cuckoo_hash.c > > > index f7b86c8..616900b 100644 > > > --- a/lib/librte_hash/rte_cuckoo_hash.c > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c > > > @@ -31,6 +31,10 @@ > > > #include "rte_hash.h" > > > #include "rte_cuckoo_hash.h" > > > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > > > \ > > > +for (CURRENT_BKT =3D START_BUCKET; = \ > > > +CURRENT_BKT !=3D NULL; \ > > > +CURRENT_BKT =3D CURRENT_BKT->next) > > > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > > > return h; > > > } > > > > > > +static inline struct rte_hash_bucket * > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > > > +while (lst_bkt->next !=3D NULL) > > > +lst_bkt =3D lst_bkt->next; > > > +return lst_bkt; > > > +} > > > + > > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t fun= c) { > > > h->cmp_jump_table_idx =3D KEY_CUSTOM; > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > > > *params) > > > struct rte_tailq_entry *te =3D NULL; > > > struct rte_hash_list *hash_list; > > > struct rte_ring *r =3D NULL; > > > +struct rte_ring *r_ext =3D NULL; > > > char hash_name[RTE_HASH_NAMESIZE]; > > > void *k =3D NULL; > > > void *buckets =3D NULL; > > > +void *buckets_ext =3D NULL; > > > char ring_name[RTE_RING_NAMESIZE]; > > > +char ext_ring_name[RTE_RING_NAMESIZE]; > > > unsigned num_key_slots; > > > unsigned i; > > > unsigned int hw_trans_mem_support =3D 0, multi_writer_support =3D 0; > > > +unsigned int ext_table_support =3D 0; > > > unsigned int readwrite_concur_support =3D 0; > > > > > > rte_hash_function default_hash_func =3D (rte_hash_function)rte_jhash= ; > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > > > *params) > > > multi_writer_support =3D 1; > > > } > > > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > > > +ext_table_support =3D 1; > > > + > > > /* Store all keys and leave the first entry as a dummy entry for > > > lookup_bulk */ > > > if (multi_writer_support) > > > /* > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > > > *params) > > > goto err; > > > } > > > > > > +const uint32_t num_buckets =3D rte_align32pow2(params->entries) / > > > +RTE_HASH_BUCKET_ENTRIES; > > > + > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > > > +params- > > > >name); > > Can be inside the if statement below. > > > > > +/* Create ring for extendable buckets. */ > > > +if (ext_table_support) { > > > +r_ext =3D rte_ring_create(ext_ring_name, > > > +rte_align32pow2(num_buckets + 1), > > > +params->socket_id, 0); > > > + > > > +if (r_ext =3D=3D NULL) { > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > > > " > > > +"failed\n"); > > > +goto err; > > > +} > > > +} > > > + > > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameter= s > > > *params) > > > goto err_unlock; > > > } > > > > > > -const uint32_t num_buckets =3D rte_align32pow2(params->entries) > > > -/ RTE_HASH_BUCKET_ENTRIES; > > > - > > > buckets =3D rte_zmalloc_socket(NULL, > > > num_buckets * sizeof(struct rte_hash_bucket), > > > RTE_CACHE_LINE_SIZE, params->socket_id); > > > > > > if (buckets =3D=3D NULL) { > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > > > goto err_unlock; > > > } > > > > > > +/* Allocate same number of extendable buckets */ > > IMO, we are allocating too much memory to support this feature. Especia= lly, when we claim that keys ending up in the extendable table is > a rare occurrence. By doubling the memory we are effectively saying that = the main table might have 50% utilization. It will also significantly > increase the cycles required to iterate the complete hash table (in rte_h= ash_iterate API) even when we expect that the extendable table > contains very few entries. > > > > I am wondering if we can provide options to control the amount of extra= memory that gets allocated and make the memory allocation > dynamic (or on demand basis). I think this also goes well with the genera= l direction DPDK is taking - allocate resources as needed rather than > allocating all the resources during initialization. > > >=20 > Given that adding new entries should not normally be a fast-path function= , Umm, I don't think I agree with that. There are plenty of cases where add/delete speed is important. Konstantin > how about allowing memory allocation in add itself. Why not initialize wi= th > a fairly small number of extra bucket entries, and then each time they ar= e > all used, double the number of entries. That will give efficient resource > scaling, I think. >=20 > /Bruce