From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80048.outbound.protection.outlook.com [40.107.8.48]) by dpdk.org (Postfix) with ESMTP id D1E625B20 for ; Thu, 27 Sep 2018 06:23:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aK4uwSBHsrt3IFsgeEEdkW91KkQr+RBMX5ng9611nVg=; b=WDEcK3Wviz8KYpaEsqSvg9Xh1HznLSmlhu4VyixlA8UvhFIP2oP6QQLqq12RgseMA+i7X9nJTL4CxGd748gMHXIllYyEs2tUluZHQV6ENuS9HYtqVPjGC8Oj0aaIrbBsbwuOjYWijbZXO2hfYfEHLz1VyX204/D509ydqMJ3AMo= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.29) by AM6PR08MB2950.eurprd08.prod.outlook.com (52.135.163.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1164.22; Thu, 27 Sep 2018 04:23:48 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::589e:d3cf:9777:5ff9]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::589e:d3cf:9777:5ff9%2]) with mapi id 15.20.1164.024; Thu, 27 Sep 2018 04:23:48 +0000 From: Honnappa Nagarahalli To: Yipeng Wang , "bruce.richardson@intel.com" CC: "dev@dpdk.org" , "michel@digirati.com.br" Thread-Topic: [PATCH v2 5/7] hash: add extendable bucket feature Thread-Index: AQHUUgpSs/Eh6+azy0Oe76JhXbgpcqUAKviw Date: Thu, 27 Sep 2018 04:23:48 +0000 Message-ID: References: <1536253745-133104-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-1-git-send-email-yipeng1.wang@intel.com> <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> In-Reply-To: <1537550255-252066-6-git-send-email-yipeng1.wang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM6PR08MB2950; 6:W1a2TNhtOFG5DC5dr4zDXkxJLB6echg1vWYYfBRIZzhFkNxrfuAe/mcvD6YffDVcljB2okoBGmR1HI2tjwVgcGMrmYgVqA6AeMeNnGPoAhMJD2NbUZrYIWH4MK2swmvPGa+n7tNANFS741nTAFREqjd9+6j6DISorRyQEr4teNvuSWJwq8pUNiDAjOiNJbFueOf880D6NVSkAI4iCahtoN6xzk+5YqiF/B2RXbAqDbVrgsuBmu4UKATEJnY2EWADofq0Lf7+erB2/6VVndomsTmT5PbPOy2wo3LKjAIVY0ONX3kDKepQI5Gjmx183Ly0y4GSw7Jwkir/cspJ25imSUZvb6NP35GaZlq4sTSi4upewa68tmFqWeC2bzyfFYtQQPs7PY6qVm9qZQ/Zdg9gff36tUB46yw1BplqQOlL7AVyZE9gd7yG9nzxO7oInGa2A80/uADJX+gEdIb6NcASbw==; 5:tGFP/W0jW6vCPF0cEs18heYbuQ7K9eP+uIsjHSTG1wmX7/04N/iRpkkI2/0Btoi64yVQEJyG8rVe5gn3ctxEZQ0rg2B+zzBxLyl6lGLVAS4QlS2XzQlxEfInPMNprxjed84mYwNbS3ocbBEbcgSOEPHZdcjQfdUxOKH9DeHEtOE=; 7:J5K53aEHT8m/2qOoAf8UsYmwmYbltgoat3ol76xjV0o7JzTGzbhwEHYvoAFrSNxY891DfyYDxP+39lOkiXGG9LC1JI++4vttcg4TFXwqNkS+wyujd6MOzDHQuHoE/otETvY/tse4yOj4cjFBVbfCGhXhvagFX/M7KeDyxPqG3AQ3H9ZqYvZ4BRJomOBKUhNGgDMDYNi2PGN51DzsUzy+FSAz4XZ4anaglhO4thUTpfMzNAo37OrgGSliaUFPBrRw x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 1ec56155-62de-4a7a-dd70-08d62431057d x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB2950; x-ms-traffictypediagnostic: AM6PR08MB2950: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(228905959029699)(166494164430575)(17755550239193)(180628864354917); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(3231355)(944501410)(52105095)(10201501046)(6055026)(149066)(150057)(6041310)(20161123558120)(20161123560045)(20161123564045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:AM6PR08MB2950; BCL:0; PCL:0; RULEID:; SRVR:AM6PR08MB2950; x-forefront-prvs: 0808323E97 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(396003)(366004)(39860400002)(376002)(346002)(199004)(40434004)(13464003)(189003)(110136005)(186003)(54906003)(305945005)(66066001)(97736004)(2906002)(68736007)(316002)(8936002)(5250100002)(81156014)(81166006)(74316002)(6246003)(33656002)(8676002)(4326008)(14444005)(5024004)(6436002)(105586002)(7696005)(7736002)(2501003)(53546011)(72206003)(229853002)(86362001)(106356001)(575784001)(486006)(446003)(102836004)(478600001)(256004)(9686003)(99286004)(53946003)(14454004)(55016002)(53936002)(6506007)(25786009)(34290500001)(5660300001)(11346002)(6116002)(3846002)(476003)(76176011)(71200400001)(2900100001)(71190400001)(26005)(579004); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB2950; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: D/de1WaoiqJ2fHScIiUd3CnucP8z5/d4QR72yOVuFojzhzPVKa8vKfZSegKOZPoP7je7QLihgjkIVxS0zPYgfJj0+VQjdFy/tzICtuwoLfnUvcP/WstXdHO17eoGMr94HkGqp5SjRX0PbQmaAGbLwVlQBL16Q7Ea2RCZ1oZgOvAoxZosCyQLekHxOjcbOEwwvEFmKoi15QjUQ+qhSCD+1XNZT5BU1+xeH14p7uxwZ/HZrHUQuALCn4twEaKiXTtD7cxabNaCA5LPwqAPvdTvjSEIvdFTCkDJvosrM47Uy/YLyxj07PjlKjZFwaN5eabzXYM7DxwF8t72pdJpucLYRB0S5N0jURWRQ/kYsTvH9XM= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1ec56155-62de-4a7a-dd70-08d62431057d X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Sep 2018 04:23:48.2054 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB2950 Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Sep 2018 04:23:50 -0000 > -----Original Message----- > From: Yipeng Wang > Sent: Friday, September 21, 2018 12:18 PM > To: bruce.richardson@intel.com > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br; > Honnappa Nagarahalli > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > In use cases that hash table capacity needs to be guaranteed, the extenda= ble > bucket feature can be used to contain extra keys in linked lists when con= flict > happens. This is similar concept to the extendable bucket hash table in p= acket > framework. > > This commit adds the extendable bucket feature. User can turn it on or of= f > through the extra flag field during table creation time. > > Extendable bucket table composes of buckets that can be linked list to cu= rrent > main table. When extendable bucket is enabled, the table utilization can > always acheive 100%. IMO, referring to this as 'table utilization' indicates an efficiency about= memory utilization. Please consider changing this to indicate that all of = the configured number of entries will be accommodated? > Although keys ending up in the ext buckets may have longer look up time, = they > should be rare due to the cuckoo algorithm. > > Signed-off-by: Yipeng Wang > --- > lib/librte_hash/rte_cuckoo_hash.c | 326 > +++++++++++++++++++++++++++++++++----- > lib/librte_hash/rte_cuckoo_hash.h | 5 + > lib/librte_hash/rte_hash.h | 3 + > 3 files changed, 292 insertions(+), 42 deletions(-) > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > b/lib/librte_hash/rte_cuckoo_hash.c > index f7b86c8..616900b 100644 > --- a/lib/librte_hash/rte_cuckoo_hash.c > +++ b/lib/librte_hash/rte_cuckoo_hash.c > @@ -31,6 +31,10 @@ > #include "rte_hash.h" > #include "rte_cuckoo_hash.h" > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > \ > +for (CURRENT_BKT =3D START_BUCKET; = \ > +CURRENT_BKT !=3D NULL; \ > +CURRENT_BKT =3D CURRENT_BKT->next) > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > return h; > } > > +static inline struct rte_hash_bucket * > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > +while (lst_bkt->next !=3D NULL) > +lst_bkt =3D lst_bkt->next; > +return lst_bkt; > +} > + > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func) = { > h->cmp_jump_table_idx =3D KEY_CUSTOM; > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > *params) > struct rte_tailq_entry *te =3D NULL; > struct rte_hash_list *hash_list; > struct rte_ring *r =3D NULL; > +struct rte_ring *r_ext =3D NULL; > char hash_name[RTE_HASH_NAMESIZE]; > void *k =3D NULL; > void *buckets =3D NULL; > +void *buckets_ext =3D NULL; > char ring_name[RTE_RING_NAMESIZE]; > +char ext_ring_name[RTE_RING_NAMESIZE]; > unsigned num_key_slots; > unsigned i; > unsigned int hw_trans_mem_support =3D 0, multi_writer_support =3D 0; > +unsigned int ext_table_support =3D 0; > unsigned int readwrite_concur_support =3D 0; > > rte_hash_function default_hash_func =3D (rte_hash_function)rte_jhash; > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > *params) > multi_writer_support =3D 1; > } > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > +ext_table_support =3D 1; > + > /* Store all keys and leave the first entry as a dummy entry for > lookup_bulk */ > if (multi_writer_support) > /* > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > *params) > goto err; > } > > +const uint32_t num_buckets =3D rte_align32pow2(params->entries) / > +RTE_HASH_BUCKET_ENTRIES; > + > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > +params- > >name); Can be inside the if statement below. > +/* Create ring for extendable buckets. */ > +if (ext_table_support) { > +r_ext =3D rte_ring_create(ext_ring_name, > +rte_align32pow2(num_buckets + 1), > +params->socket_id, 0); > + > +if (r_ext =3D=3D NULL) { > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > " > +"failed\n"); > +goto err; > +} > +} > + > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters > *params) > goto err_unlock; > } > > -const uint32_t num_buckets =3D rte_align32pow2(params->entries) > -/ RTE_HASH_BUCKET_ENTRIES; > - > buckets =3D rte_zmalloc_socket(NULL, > num_buckets * sizeof(struct rte_hash_bucket), > RTE_CACHE_LINE_SIZE, params->socket_id); > > if (buckets =3D=3D NULL) { > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > goto err_unlock; > } > > +/* Allocate same number of extendable buckets */ IMO, we are allocating too much memory to support this feature. Especially,= when we claim that keys ending up in the extendable table is a rare occurr= ence. By doubling the memory we are effectively saying that the main table = might have 50% utilization. It will also significantly increase the cycles = required to iterate the complete hash table (in rte_hash_iterate API) even = when we expect that the extendable table contains very few entries. I am wondering if we can provide options to control the amount of extra mem= ory that gets allocated and make the memory allocation dynamic (or on deman= d basis). I think this also goes well with the general direction DPDK is ta= king - allocate resources as needed rather than allocating all the resource= s during initialization. > +if (ext_table_support) { > +buckets_ext =3D rte_zmalloc_socket(NULL, > +num_buckets * sizeof(struct rte_hash_bucket), > +RTE_CACHE_LINE_SIZE, params->socket_id); > +if (buckets_ext =3D=3D NULL) { > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > " > +"failed\n"); > +goto err_unlock; > +} > +/* Populate ext bkt ring. We reserve 0 similar to the > + * key-data slot, just in case in future we want to > + * use bucket index for the linked list and 0 means NULL > + * for next bucket > + */ > +for (i =3D 1; i <=3D num_buckets; i++) Since, the bucket index 0 is reserved, should be 'i < num_buckets' > +rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i)); > +} > + > const uint32_t key_entry_size =3D sizeof(struct rte_hash_key) + params- > >key_len; > const uint64_t key_tbl_size =3D (uint64_t) key_entry_size * > num_key_slots; > > @@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters > *params) > h->num_buckets =3D num_buckets; > h->bucket_bitmask =3D h->num_buckets - 1; > h->buckets =3D buckets; > +h->buckets_ext =3D buckets_ext; > +h->free_ext_bkts =3D r_ext; > h->hash_func =3D (params->hash_func =3D=3D NULL) ? > default_hash_func : params->hash_func; > h->key_store =3D k; > @@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters > *params) > h->hw_trans_mem_support =3D hw_trans_mem_support; > h->multi_writer_support =3D multi_writer_support; > h->readwrite_concur_support =3D readwrite_concur_support; > +h->ext_table_support =3D ext_table_support; > > #if defined(RTE_ARCH_X86) > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > @@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters > *params) > rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > err: > rte_ring_free(r); > +rte_ring_free(r_ext); > rte_free(te); > rte_free(h); > rte_free(buckets); > +rte_free(buckets_ext); > rte_free(k); > return NULL; > } > @@ -344,6 +402,7 @@ rte_hash_free(struct rte_hash *h) > rte_free(h->readwrite_lock); > } > rte_ring_free(h->free_slots); > +rte_ring_free(h->free_ext_bkts); > rte_free(h->key_store); > rte_free(h->buckets); Add rte_free(h->buckets_ext); > rte_free(h); > @@ -403,7 +462,6 @@ __hash_rw_writer_lock(const struct rte_hash *h) > rte_rwlock_write_lock(h->readwrite_lock); > } > > - > static inline void > __hash_rw_reader_lock(const struct rte_hash *h) { @@ -448,6 +506,14 @@ > rte_hash_reset(struct rte_hash *h) > while (rte_ring_dequeue(h->free_slots, &ptr) =3D=3D 0) > rte_pause(); > > +/* clear free extendable bucket ring and memory */ > +if (h->ext_table_support) { > +memset(h->buckets_ext, 0, h->num_buckets * > +sizeof(struct > rte_hash_bucket)); > +while (rte_ring_dequeue(h->free_ext_bkts, &ptr) =3D=3D 0) > +rte_pause(); > +} > + > /* Repopulate the free slots ring. Entry zero is reserved for key misses > */ > if (h->multi_writer_support) > tot_ring_cnt =3D h->entries + (RTE_MAX_LCORE - 1) * @@ - > 458,6 +524,12 @@ rte_hash_reset(struct rte_hash *h) > for (i =3D 1; i < tot_ring_cnt + 1; i++) > rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i)); > > +/* Repopulate the free ext bkt ring. */ > +if (h->ext_table_support) > +for (i =3D 1; i < h->num_buckets + 1; i++) Index 0 is reserved as per the comments. Condition should be 'i < h->num_bu= ckets'. > +rte_ring_sp_enqueue(h->free_ext_bkts, > +(void *)((uintptr_t) i)); > + > if (h->multi_writer_support) { > /* Reset local caches per lcore */ > for (i =3D 0; i < RTE_MAX_LCORE; i++) > @@ -524,24 +596,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash > *h, > int32_t *ret_val) > { > unsigned int i; > -struct rte_hash_bucket *cur_bkt =3D prim_bkt; > +struct rte_hash_bucket *cur_bkt; > int32_t ret; > > __hash_rw_writer_lock(h); > /* Check if key was inserted after last check but before this > * protected region in case of inserting duplicated keys. > */ > -ret =3D search_and_update(h, data, key, cur_bkt, sig, alt_hash); > +ret =3D search_and_update(h, data, key, prim_bkt, sig, alt_hash); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > *ret_val =3D ret; > return 1; > } > -ret =3D search_and_update(h, data, key, sec_bkt, alt_hash, sig); > -if (ret !=3D -1) { > -__hash_rw_writer_unlock(h); > -*ret_val =3D ret; > -return 1; > + > +FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > +ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > +if (ret !=3D -1) { > +__hash_rw_writer_unlock(h); > +*ret_val =3D ret; > +return 1; > +} > } > > /* Insert new entry if there is room in the primary @@ -580,7 +655,7 > @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h, > int32_t *ret_val) > { > uint32_t prev_alt_bkt_idx; > -struct rte_hash_bucket *cur_bkt =3D bkt; > +struct rte_hash_bucket *cur_bkt; > struct queue_node *prev_node, *curr_node =3D leaf; > struct rte_hash_bucket *prev_bkt, *curr_bkt =3D leaf->bkt; > uint32_t prev_slot, curr_slot =3D leaf_slot; @@ -597,18 +672,20 @@ > rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h, > /* Check if key was inserted after last check but before this > * protected region. > */ > -ret =3D search_and_update(h, data, key, cur_bkt, sig, alt_hash); > +ret =3D search_and_update(h, data, key, bkt, sig, alt_hash); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > *ret_val =3D ret; > return 1; > } > > -ret =3D search_and_update(h, data, key, alt_bkt, alt_hash, sig); > -if (ret !=3D -1) { > -__hash_rw_writer_unlock(h); > -*ret_val =3D ret; > -return 1; > +FOR_EACH_BUCKET(cur_bkt, alt_bkt) { > +ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > +if (ret !=3D -1) { > +__hash_rw_writer_unlock(h); > +*ret_val =3D ret; > +return 1; > +} > } > > while (likely(curr_node->prev !=3D NULL)) { @@ -711,15 +788,18 @@ > __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, = { > hash_sig_t alt_hash; > uint32_t prim_bucket_idx, sec_bucket_idx; > -struct rte_hash_bucket *prim_bkt, *sec_bkt; > +struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt; > struct rte_hash_key *new_k, *keys =3D h->key_store; > void *slot_id =3D NULL; > -uint32_t new_idx; > +void *ext_bkt_id =3D NULL; > +uint32_t new_idx, bkt_id; > int ret; > unsigned n_slots; > unsigned lcore_id; > +unsigned int i; > struct lcore_cache *cached_free_slots =3D NULL; > int32_t ret_val; > +struct rte_hash_bucket *last; > > prim_bucket_idx =3D sig & h->bucket_bitmask; > prim_bkt =3D &h->buckets[prim_bucket_idx]; @@ -739,10 +819,12 @@ > __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, > } > > /* Check if key is already inserted in secondary location */ > -ret =3D search_and_update(h, data, key, sec_bkt, alt_hash, sig); > -if (ret !=3D -1) { > -__hash_rw_writer_unlock(h); > -return ret; > +FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > +ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > +if (ret !=3D -1) { > +__hash_rw_writer_unlock(h); > +return ret; > +} > } > __hash_rw_writer_unlock(h); > > @@ -808,10 +890,71 @@ __rte_hash_add_key_with_hash(const struct > rte_hash *h, const void *key, > else if (ret =3D=3D 1) { > enqueue_slot_back(h, cached_free_slots, slot_id); > return ret_val; > -} else { > +} > + > +/* if ext table not enabled, we failed the insertion */ > +if (!h->ext_table_support) { > enqueue_slot_back(h, cached_free_slots, slot_id); > return ret; > } > + > +/* Now we need to go through the extendable bucket. Protection is > needed > + * to protect all extendable bucket processes. > + */ > +__hash_rw_writer_lock(h); > +/* We check for duplicates again since could be inserted before the > lock */ > +ret =3D search_and_update(h, data, key, prim_bkt, sig, alt_hash); > +if (ret !=3D -1) { > +enqueue_slot_back(h, cached_free_slots, slot_id); > +goto failure; > +} > + > +FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > +ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > +if (ret !=3D -1) { > +enqueue_slot_back(h, cached_free_slots, slot_id); > +goto failure; > +} > +} > + > +/* Search extendable buckets to find an empty entry to insert. */ > +struct rte_hash_bucket *next_bkt =3D sec_bkt->next; > +FOR_EACH_BUCKET(cur_bkt, next_bkt) { > +for (i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > +/* Check if slot is available */ > +if (likely(cur_bkt->key_idx[i] =3D=3D EMPTY_SLOT)) { > +cur_bkt->sig_current[i] =3D alt_hash; > +cur_bkt->sig_alt[i] =3D sig; > +cur_bkt->key_idx[i] =3D new_idx; > +__hash_rw_writer_unlock(h); > +return new_idx - 1; > +} > +} > +} > + > +/* Failed to get an empty entry from extendable buckets. Link a new > + * extendable bucket. We first get a free bucket from ring. > + */ > +if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) !=3D 0) { > +ret =3D -ENOSPC; > +goto failure; > +} > + > +bkt_id =3D (uint32_t)((uintptr_t)ext_bkt_id) - 1; If index 0 is reserved, -1 is not required. > +/* Use the first location of the new bucket */ > +(h->buckets_ext[bkt_id]).sig_current[0] =3D alt_hash; > +(h->buckets_ext[bkt_id]).sig_alt[0] =3D sig; > +(h->buckets_ext[bkt_id]).key_idx[0] =3D new_idx; > +/* Link the new bucket to sec bucket linked list */ > +last =3D rte_hash_get_last_bkt(sec_bkt); > +last->next =3D &h->buckets_ext[bkt_id]; > +__hash_rw_writer_unlock(h); > +return new_idx - 1; > + > +failure: > +__hash_rw_writer_unlock(h); > +return ret; > + > } > > int32_t > @@ -890,7 +1033,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash > *h, const void *key, { > uint32_t bucket_idx; > hash_sig_t alt_hash; > -struct rte_hash_bucket *bkt; > +struct rte_hash_bucket *bkt, *cur_bkt; > int ret; > > bucket_idx =3D sig & h->bucket_bitmask; > @@ -910,10 +1053,12 @@ __rte_hash_lookup_with_hash(const struct > rte_hash *h, const void *key, > bkt =3D &h->buckets[bucket_idx]; > > /* Check if key is in secondary location */ > -ret =3D search_one_bucket(h, key, alt_hash, data, bkt); > -if (ret !=3D -1) { > -__hash_rw_reader_unlock(h); > -return ret; > +FOR_EACH_BUCKET(cur_bkt, bkt) { > +ret =3D search_one_bucket(h, key, alt_hash, data, cur_bkt); > +if (ret !=3D -1) { > +__hash_rw_reader_unlock(h); > +return ret; > +} > } > __hash_rw_reader_unlock(h); > return -ENOENT; > @@ -1015,15 +1160,17 @@ __rte_hash_del_key_with_hash(const struct > rte_hash *h, const void *key, { > uint32_t bucket_idx; > hash_sig_t alt_hash; > -struct rte_hash_bucket *bkt; > -int32_t ret; > +struct rte_hash_bucket *prim_bkt, *sec_bkt; > +struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt; > +int32_t ret, i; > +struct rte_hash_bucket *tobe_removed_bkt =3D NULL; > > bucket_idx =3D sig & h->bucket_bitmask; > -bkt =3D &h->buckets[bucket_idx]; > +prim_bkt =3D &h->buckets[bucket_idx]; > > __hash_rw_writer_lock(h); > /* look for key in primary bucket */ > -ret =3D search_and_remove(h, key, bkt, sig); > +ret =3D search_and_remove(h, key, prim_bkt, sig); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > return ret; > @@ -1032,17 +1179,51 @@ __rte_hash_del_key_with_hash(const struct > rte_hash *h, const void *key, > /* Calculate secondary hash */ > alt_hash =3D rte_hash_secondary_hash(sig); > bucket_idx =3D alt_hash & h->bucket_bitmask; > -bkt =3D &h->buckets[bucket_idx]; > +sec_bkt =3D &h->buckets[bucket_idx]; > > /* look for key in secondary bucket */ > -ret =3D search_and_remove(h, key, bkt, alt_hash); > +ret =3D search_and_remove(h, key, sec_bkt, alt_hash); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > return ret; > } > > +/* Not in main table, we need to search ext buckets */ > +if (h->ext_table_support) { > +next_bkt =3D sec_bkt->next; > +FOR_EACH_BUCKET(cur_bkt, next_bkt) { > +ret =3D search_and_remove(h, key, cur_bkt, alt_hash); > +if (ret !=3D -1) > +goto return_bkt; > +} > +} > + > __hash_rw_writer_unlock(h); > return -ENOENT; > + > +/* Search extendable buckets to see if any empty bucket need to be > +recycled */ > +return_bkt: > +for (cur_bkt =3D sec_bkt->next, prev_bkt =3D sec_bkt; cur_bkt !=3D NULL; > +prev_bkt =3D cur_bkt, cur_bkt =3D cur_bkt->next) { > +for (i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > +if (cur_bkt->key_idx[i] !=3D EMPTY_SLOT) > +break; > +} > +if (i =3D=3D RTE_HASH_BUCKET_ENTRIES) { > +prev_bkt->next =3D cur_bkt->next; > +cur_bkt->next =3D NULL; > +tobe_removed_bkt =3D cur_bkt; > +break; > +} > +} > + > +__hash_rw_writer_unlock(h); > + > +if (tobe_removed_bkt) { > +uint32_t index =3D tobe_removed_bkt - h->buckets_ext + 1; No need to increase the index by 1 if entry 0 is reserved. > +rte_ring_mp_enqueue(h->free_ext_bkts, (void > *)(uintptr_t)index); > +} > +return ret; > } > > int32_t > @@ -1143,12 +1324,14 @@ __rte_hash_lookup_bulk(const struct rte_hash > *h, const void **keys, { > uint64_t hits =3D 0; > int32_t i; > +int32_t ret; > uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX]; > uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX]; > const struct rte_hash_bucket > *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX]; > const struct rte_hash_bucket > *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX]; > uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] =3D {0}; > uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] =3D {0}; > +struct rte_hash_bucket *cur_bkt, *next_bkt; > > /* Prefetch first keys */ > for (i =3D 0; i < PREFETCH_OFFSET && i < num_keys; i++) @@ -1266,6 > +1449,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void > **keys, > continue; > } > > +/* all found, do not need to go through ext bkt */ > +if ((hits =3D=3D ((1ULL << num_keys) - 1)) || !h->ext_table_support) { > +if (hit_mask !=3D NULL) > +*hit_mask =3D hits; > +__hash_rw_reader_unlock(h); > +return; > +} > + > +/* need to check ext buckets for match */ > +for (i =3D 0; i < num_keys; i++) { > +if ((hits & (1ULL << i)) !=3D 0) > +continue; > +next_bkt =3D secondary_bkt[i]->next; > +FOR_EACH_BUCKET(cur_bkt, next_bkt) { > +if (data !=3D NULL) > +ret =3D search_one_bucket(h, keys[i], > +sec_hash[i], &data[i], > cur_bkt); > +else > +ret =3D search_one_bucket(h, keys[i], > +sec_hash[i], NULL, cur_bkt); > +if (ret !=3D -1) { > +positions[i] =3D ret; > +hits |=3D 1ULL << i; > +break; > +} > +} > +} > + > __hash_rw_reader_unlock(h); > > if (hit_mask !=3D NULL) > @@ -1308,10 +1519,13 @@ rte_hash_iterate(const struct rte_hash *h, const > void **key, void **data, uint32 > > RETURN_IF_TRUE(((h =3D=3D NULL) || (next =3D=3D NULL)), -EINVAL); > > -const uint32_t total_entries =3D h->num_buckets * > RTE_HASH_BUCKET_ENTRIES; > +const uint32_t total_entries_main =3D h->num_buckets * > + > RTE_HASH_BUCKET_ENTRIES; > +const uint32_t total_entries =3D total_entries_main << 1; > + > /* Out of bounds */ Minor: update the comment to reflect the new code. > -if (*next >=3D total_entries) > -return -ENOENT; > +if (*next >=3D total_entries_main) > +goto extend_table; > > /* Calculate bucket and index of current iterator */ > bucket_idx =3D *next / RTE_HASH_BUCKET_ENTRIES; @@ -1321,8 > +1535,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, v= oid > **data, uint32 > while (h->buckets[bucket_idx].key_idx[idx] =3D=3D EMPTY_SLOT) { > (*next)++; > /* End of table */ > -if (*next =3D=3D total_entries) > -return -ENOENT; > +if (*next =3D=3D total_entries_main) > +goto extend_table; > bucket_idx =3D *next / RTE_HASH_BUCKET_ENTRIES; > idx =3D *next % RTE_HASH_BUCKET_ENTRIES; > } > @@ -1341,4 +1555,32 @@ rte_hash_iterate(const struct rte_hash *h, const > void **key, void **data, uint32 > (*next)++; > > return position - 1; > + > +extend_table: > +/* Out of bounds */ > +if (*next >=3D total_entries || !h->ext_table_support) > +return -ENOENT; > + > +bucket_idx =3D (*next - total_entries_main) / > RTE_HASH_BUCKET_ENTRIES; > +idx =3D (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES; > + > +while (h->buckets_ext[bucket_idx].key_idx[idx] =3D=3D EMPTY_SLOT) { > +(*next)++; > +if (*next =3D=3D total_entries) > +return -ENOENT; > +bucket_idx =3D (*next - total_entries_main) / > +RTE_HASH_BUCKET_ENTRIES; > +idx =3D (*next - total_entries_main) % > RTE_HASH_BUCKET_ENTRIES; > +} > +/* Get position of entry in key table */ > +position =3D h->buckets_ext[bucket_idx].key_idx[idx]; There is a possibility that 'position' is not the same value read in the wh= ile loop. It presents a problem if 'position' becomes EMPTY_SLOT. 'position= ' should be read as part of the while loop. Since it is 32b value, it shoul= d be atomic on most platforms. This issue applies to existing code as well. __hash_rw_reader_lock(h) required > +next_key =3D (struct rte_hash_key *) ((char *)h->key_store + > +position * h->key_entry_size); > +/* Return key and data */ > +*key =3D next_key->key; > +*data =3D next_key->pdata; > + __hash_rw_reader_unlock(h) required > +/* Increment iterator */ > +(*next)++; > +return position - 1; > } > diff --git a/lib/librte_hash/rte_cuckoo_hash.h > b/lib/librte_hash/rte_cuckoo_hash.h > index fc0e5c2..e601520 100644 > --- a/lib/librte_hash/rte_cuckoo_hash.h > +++ b/lib/librte_hash/rte_cuckoo_hash.h > @@ -142,6 +142,8 @@ struct rte_hash_bucket { > hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES]; > > uint8_t flag[RTE_HASH_BUCKET_ENTRIES]; > + > +void *next; > } __rte_cache_aligned; > > /** A hash table structure. */ > @@ -166,6 +168,7 @@ struct rte_hash { > /**< If multi-writer support is enabled. */ > uint8_t readwrite_concur_support; > /**< If read-write concurrency support is enabled */ > +uint8_t ext_table_support; /**< Enable extendable bucket table */ > rte_hash_function hash_func; /**< Function used to calculate hash. > */ > uint32_t hash_func_init_val; /**< Init value used by hash_func. */ > rte_hash_cmp_eq_t rte_hash_custom_cmp_eq; @@ -184,6 +187,8 > @@ struct rte_hash { > * to the key table. > */ > rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */ > +struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */ > +struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets > +*/ > } __rte_cache_aligned; > > struct queue_node { > diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h inde= x > 9e7d931..11d8e28 100644 > --- a/lib/librte_hash/rte_hash.h > +++ b/lib/librte_hash/rte_hash.h > @@ -37,6 +37,9 @@ extern "C" { > /** Flag to support reader writer concurrency */ #define > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04 > > +/** Flag to indicate the extendabe bucket table feature should be used > +*/ #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08 > + > /** Signature of key that is stored internally. */ typedef uint32_t has= h_sig_t; > > -- > 2.7.4 IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.