From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 3C9561B450 for ; Thu, 27 Sep 2018 14:28:01 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2018 05:28:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,310,1534834800"; d="scan'208";a="267225960" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.107]) by fmsmga006.fm.intel.com with SMTP; 27 Sep 2018 05:27:57 -0700 Received: by (sSMTP sendmail emulation); Thu, 27 Sep 2018 13:27:57 +0100 Date: Thu, 27 Sep 2018 13:27:56 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" Cc: Honnappa Nagarahalli , "Wang, Yipeng1" , "dev@dpdk.org" , "michel@digirati.com.br" Message-ID: <20180927122756.GA776@bricha3-MOBL.ger.corp.intel.com> References: <1536253745-133104-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> <20180927111501.GC19604@bricha3-MOBL.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580102FDE80F@IRSMSX106.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772580102FDE80F@IRSMSX106.ger.corp.intel.com> Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 12:28:02 -0000 On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > Sent: Thursday, September 27, 2018 12:15 PM > > To: Honnappa Nagarahalli > > Cc: Wang, Yipeng1 ; dev@dpdk.org; michel@digirati.com.br > > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature > > > > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote: > > > > > > > > > > -----Original Message----- > > > > From: Yipeng Wang > > > > Sent: Friday, September 21, 2018 12:18 PM > > > > To: bruce.richardson@intel.com > > > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br; > > > > Honnappa Nagarahalli > > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > > > > > > > In use cases that hash table capacity needs to be guaranteed, the extendable > > > > bucket feature can be used to contain extra keys in linked lists when conflict > > > > happens. This is similar concept to the extendable bucket hash table in packet > > > > framework. > > > > > > > > This commit adds the extendable bucket feature. User can turn it on or off > > > > through the extra flag field during table creation time. > > > > > > > > Extendable bucket table composes of buckets that can be linked list to current > > > > main table. When extendable bucket is enabled, the table utilization can > > > > always acheive 100%. > > > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that > > all of the configured number of entries will be accommodated? > > > > > > > Although keys ending up in the ext buckets may have longer look up time, they > > > > should be rare due to the cuckoo algorithm. > > > > > > > > Signed-off-by: Yipeng Wang > > > > --- > > > > lib/librte_hash/rte_cuckoo_hash.c | 326 > > > > +++++++++++++++++++++++++++++++++----- > > > > lib/librte_hash/rte_cuckoo_hash.h | 5 + > > > > lib/librte_hash/rte_hash.h | 3 + > > > > 3 files changed, 292 insertions(+), 42 deletions(-) > > > > > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > > > > b/lib/librte_hash/rte_cuckoo_hash.c > > > > index f7b86c8..616900b 100644 > > > > --- a/lib/librte_hash/rte_cuckoo_hash.c > > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c > > > > @@ -31,6 +31,10 @@ > > > > #include "rte_hash.h" > > > > #include "rte_cuckoo_hash.h" > > > > > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > > > > \ > > > > +for (CURRENT_BKT = START_BUCKET; \ > > > > +CURRENT_BKT != NULL; \ > > > > +CURRENT_BKT = CURRENT_BKT->next) > > > > > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > > > > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > > > > return h; > > > > } > > > > > > > > +static inline struct rte_hash_bucket * > > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > > > > +while (lst_bkt->next != NULL) > > > > +lst_bkt = lst_bkt->next; > > > > +return lst_bkt; > > > > +} > > > > + > > > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func) { > > > > h->cmp_jump_table_idx = KEY_CUSTOM; > > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > > > > *params) > > > > struct rte_tailq_entry *te = NULL; > > > > struct rte_hash_list *hash_list; > > > > struct rte_ring *r = NULL; > > > > +struct rte_ring *r_ext = NULL; > > > > char hash_name[RTE_HASH_NAMESIZE]; > > > > void *k = NULL; > > > > void *buckets = NULL; > > > > +void *buckets_ext = NULL; > > > > char ring_name[RTE_RING_NAMESIZE]; > > > > +char ext_ring_name[RTE_RING_NAMESIZE]; > > > > unsigned num_key_slots; > > > > unsigned i; > > > > unsigned int hw_trans_mem_support = 0, multi_writer_support = 0; > > > > +unsigned int ext_table_support = 0; > > > > unsigned int readwrite_concur_support = 0; > > > > > > > > rte_hash_function default_hash_func = (rte_hash_function)rte_jhash; > > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > > > > *params) > > > > multi_writer_support = 1; > > > > } > > > > > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > > > > +ext_table_support = 1; > > > > + > > > > /* Store all keys and leave the first entry as a dummy entry for > > > > lookup_bulk */ > > > > if (multi_writer_support) > > > > /* > > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > > > > *params) > > > > goto err; > > > > } > > > > > > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) / > > > > +RTE_HASH_BUCKET_ENTRIES; > > > > + > > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > > > > +params- > > > > >name); > > > Can be inside the if statement below. > > > > > > > +/* Create ring for extendable buckets. */ > > > > +if (ext_table_support) { > > > > +r_ext = rte_ring_create(ext_ring_name, > > > > +rte_align32pow2(num_buckets + 1), > > > > +params->socket_id, 0); > > > > + > > > > +if (r_ext == NULL) { > > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > > > > " > > > > +"failed\n"); > > > > +goto err; > > > > +} > > > > +} > > > > + > > > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > > > > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters > > > > *params) > > > > goto err_unlock; > > > > } > > > > > > > > -const uint32_t num_buckets = rte_align32pow2(params->entries) > > > > -/ RTE_HASH_BUCKET_ENTRIES; > > > > - > > > > buckets = rte_zmalloc_socket(NULL, > > > > num_buckets * sizeof(struct rte_hash_bucket), > > > > RTE_CACHE_LINE_SIZE, params->socket_id); > > > > > > > > if (buckets == NULL) { > > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > > > > goto err_unlock; > > > > } > > > > > > > > +/* Allocate same number of extendable buckets */ > > > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is > > a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly > > increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table > > contains very few entries. > > > > > > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation > > dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than > > allocating all the resources during initialization. > > > > > > > Given that adding new entries should not normally be a fast-path function, > > Umm, I don't think I agree with that. > There are plenty of cases where add/delete speed is important. > Konstantin True, I suppose. Perhaps then the best approach is to give a couple of options to the developer. Allow specifying an initial amount of memory, and then an option to allow the memory to grow or not. If the cost of memory allocation is a problem for a specific app, then they can provide a large default and not allow growing, while other apps, for whom speed is not that critical, can provide a small default and allow growing. Does that seem reasonable? /Bruce