From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id CA6091B207 for ; Thu, 27 Sep 2018 13:18:53 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Sep 2018 04:18:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,310,1534834800"; d="scan'208";a="83727576" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.107]) by FMSMGA003.fm.intel.com with SMTP; 27 Sep 2018 04:15:03 -0700 Received: by (sSMTP sendmail emulation); Thu, 27 Sep 2018 12:15:02 +0100 Date: Thu, 27 Sep 2018 12:15:02 +0100 From: Bruce Richardson To: Honnappa Nagarahalli Cc: Yipeng Wang , "dev@dpdk.org" , "michel@digirati.com.br" Message-ID: <20180927111501.GC19604@bricha3-MOBL.ger.corp.intel.com> References: <1536253745-133104-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 11:18:54 -0000 On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote: > > > > -----Original Message----- > > From: Yipeng Wang > > Sent: Friday, September 21, 2018 12:18 PM > > To: bruce.richardson@intel.com > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br; > > Honnappa Nagarahalli > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > > > In use cases that hash table capacity needs to be guaranteed, the extendable > > bucket feature can be used to contain extra keys in linked lists when conflict > > happens. This is similar concept to the extendable bucket hash table in packet > > framework. > > > > This commit adds the extendable bucket feature. User can turn it on or off > > through the extra flag field during table creation time. > > > > Extendable bucket table composes of buckets that can be linked list to current > > main table. When extendable bucket is enabled, the table utilization can > > always acheive 100%. > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that all of the configured number of entries will be accommodated? > > > Although keys ending up in the ext buckets may have longer look up time, they > > should be rare due to the cuckoo algorithm. > > > > Signed-off-by: Yipeng Wang > > --- > > lib/librte_hash/rte_cuckoo_hash.c | 326 > > +++++++++++++++++++++++++++++++++----- > > lib/librte_hash/rte_cuckoo_hash.h | 5 + > > lib/librte_hash/rte_hash.h | 3 + > > 3 files changed, 292 insertions(+), 42 deletions(-) > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > > b/lib/librte_hash/rte_cuckoo_hash.c > > index f7b86c8..616900b 100644 > > --- a/lib/librte_hash/rte_cuckoo_hash.c > > +++ b/lib/librte_hash/rte_cuckoo_hash.c > > @@ -31,6 +31,10 @@ > > #include "rte_hash.h" > > #include "rte_cuckoo_hash.h" > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > > \ > > +for (CURRENT_BKT = START_BUCKET; \ > > +CURRENT_BKT != NULL; \ > > +CURRENT_BKT = CURRENT_BKT->next) > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > > return h; > > } > > > > +static inline struct rte_hash_bucket * > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > > +while (lst_bkt->next != NULL) > > +lst_bkt = lst_bkt->next; > > +return lst_bkt; > > +} > > + > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func) { > > h->cmp_jump_table_idx = KEY_CUSTOM; > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > struct rte_tailq_entry *te = NULL; > > struct rte_hash_list *hash_list; > > struct rte_ring *r = NULL; > > +struct rte_ring *r_ext = NULL; > > char hash_name[RTE_HASH_NAMESIZE]; > > void *k = NULL; > > void *buckets = NULL; > > +void *buckets_ext = NULL; > > char ring_name[RTE_RING_NAMESIZE]; > > +char ext_ring_name[RTE_RING_NAMESIZE]; > > unsigned num_key_slots; > > unsigned i; > > unsigned int hw_trans_mem_support = 0, multi_writer_support = 0; > > +unsigned int ext_table_support = 0; > > unsigned int readwrite_concur_support = 0; > > > > rte_hash_function default_hash_func = (rte_hash_function)rte_jhash; > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > multi_writer_support = 1; > > } > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > > +ext_table_support = 1; > > + > > /* Store all keys and leave the first entry as a dummy entry for > > lookup_bulk */ > > if (multi_writer_support) > > /* > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > goto err; > > } > > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) / > > +RTE_HASH_BUCKET_ENTRIES; > > + > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > > +params- > > >name); > Can be inside the if statement below. > > > +/* Create ring for extendable buckets. */ > > +if (ext_table_support) { > > +r_ext = rte_ring_create(ext_ring_name, > > +rte_align32pow2(num_buckets + 1), > > +params->socket_id, 0); > > + > > +if (r_ext == NULL) { > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > > " > > +"failed\n"); > > +goto err; > > +} > > +} > > + > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > goto err_unlock; > > } > > > > -const uint32_t num_buckets = rte_align32pow2(params->entries) > > -/ RTE_HASH_BUCKET_ENTRIES; > > - > > buckets = rte_zmalloc_socket(NULL, > > num_buckets * sizeof(struct rte_hash_bucket), > > RTE_CACHE_LINE_SIZE, params->socket_id); > > > > if (buckets == NULL) { > > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > > goto err_unlock; > > } > > > > +/* Allocate same number of extendable buckets */ > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table contains very few entries. > > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than allocating all the resources during initialization. > Given that adding new entries should not normally be a fast-path function, how about allowing memory allocation in add itself. Why not initialize with a fairly small number of extra bucket entries, and then each time they are all used, double the number of entries. That will give efficient resource scaling, I think. /Bruce