From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00069.outbound.protection.outlook.com [40.107.0.69]) by dpdk.org (Postfix) with ESMTP id 2C142378E for ; Tue, 2 Oct 2018 05:58:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YsmVz9hs8PYy6qijknWr0YfX1EgxS0/qW6xF1+liY8E=; b=NXu6/GVvGrQbPb1T3FO6pXm8+aP2utc8qbQrs9miC8JeJ3yHgZCIwzxS2+p12pa2ETZjWSkA/E7foIaHGWyFM4rRZxZmxNgTqKeYc0EuCIPtsm3BA2RFReGzW583b5ijSho7BOb/mE6osysF3CvIVy44bxYfd+WtMLW2OHmRY40= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.29) by AM6PR08MB3384.eurprd08.prod.outlook.com (20.177.112.221) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1185.24; Tue, 2 Oct 2018 03:58:45 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::f423:e46a:a03c:e928]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::f423:e46a:a03c:e928%2]) with mapi id 15.20.1185.024; Tue, 2 Oct 2018 03:58:45 +0000 From: Honnappa Nagarahalli To: Yipeng Wang , "bruce.richardson@intel.com" CC: "konstantin.ananyev@intel.com" , "dev@dpdk.org" , "sameh.gobriel@intel.com" , Honnappa Nagarahalli , nd Thread-Topic: [PATCH v4 2/4] hash: add extendable bucket feature Thread-Index: AQHUV4tn5zytEct7aE6GLUXLgUt5Y6UK4rDw Date: Tue, 2 Oct 2018 03:58:44 +0000 Message-ID: References: <1537993618-92630-1-git-send-email-yipeng1.wang@intel.com> <1538155426-145177-1-git-send-email-yipeng1.wang@intel.com> <1538155426-145177-3-git-send-email-yipeng1.wang@intel.com> In-Reply-To: <1538155426-145177-3-git-send-email-yipeng1.wang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.103.75] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM6PR08MB3384; 6:1rEtBv+SGDzu/G65NyNISkiLis7C5hr4MNKTXL6dI36mA9KvPgkGJytyvTsE1RlnYxp0AV4HMFlTpev+VfGosFSTRsZmcajlOK8MYPQMYkAAfc4RoNfGy+WA5kTaoNkSJMxi9V4EkJ6bXsZiWiHKbkrTYjkxOKKh5FptJzJEDfymjh25WKEleJF1O/Qe+kk5KJt3qxU7kO0TRZFezxglIiPiQQpCH8Zp4N/ZZ+9aIzEnVabjEypVMnJdniAT09ABvJ3UtIj04wtshlqABIQ+oMsZogWeQYsJUVEk8YbFqjPrl8WgDeoeiHDMKXkFRsUc08xCssj4YTcSVs6quPPFz4FAVuMyD/TqnSCWFBMHPexz0f2jWWmtSl9Uasdsh65T1fPQ0ED+gNgbBf5hXI6vJ7uTZ6svRgMs7gKAbGt9sOIT6asHqX11YLmvpWh5QvynsWoqdqRMpHMMFrvwNd677Q==; 5:uDjTBl6nW9wNXoU6Ub0WA2eiM2QQJG7ZJdOGA1DbVVzbnNRfUDn84o3ngFRw9fP4Wu2YhjYUeIVLHzuTb3e5J/qSF2W5XafNrjHwriIAErTUV8dEi8zTY3tYfcrHKnOsTqJqKycrvj4UH/pHQ/E6oWHN/8m/Jbj+PtkqmKyNeQ8=; 7:1BN31Kja6LmB8HqO6xOBLXgfgye1o02sT3zpPBkrypuUwX1nEBw0uz3wttnaY74SNyIigmlWm6TtzN3LxTFS9fdfkq0b9roG4HyRw/0tZ/tbTihSLoFMwqFHteAnMLrKIZy161JqDfiFqfezX2F0rZXu3dDa8IPtv+P1zW+iztUseoWCi/td3wJiPBFH300mXXlz7o/WmQ6s1Tp+UVN+1FBGOGp5a2fCuHdNfPBFPU++M1XdbPbbeRxElqVsYZvF x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-ms-office365-filtering-correlation-id: 58f570d1-4c26-474e-a06c-08d6281b598f x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534165)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB3384; x-ms-traffictypediagnostic: AM6PR08MB3384: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(228905959029699); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3231355)(944501410)(52105095)(3002001)(6055026)(149066)(150057)(6041310)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123564045)(20161123560045)(201708071742011)(7699051); SRVR:AM6PR08MB3384; BCL:0; PCL:0; RULEID:; SRVR:AM6PR08MB3384; x-forefront-prvs: 0813C68E65 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(396003)(346002)(136003)(376002)(199004)(189003)(8676002)(478600001)(54906003)(110136005)(9686003)(2906002)(72206003)(3846002)(5660300001)(6116002)(476003)(102836004)(486006)(106356001)(7696005)(11346002)(105586002)(6506007)(446003)(14444005)(99286004)(186003)(26005)(6436002)(55016002)(76176011)(256004)(14454004)(53936002)(8936002)(81166006)(81156014)(2900100001)(5250100002)(229853002)(68736007)(97736004)(2501003)(4326008)(71200400001)(66066001)(71190400001)(25786009)(316002)(305945005)(6246003)(74316002)(33656002)(7736002)(575784001)(53946003)(4744004)(86362001)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB3384; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: xrBrtl7O18IH6Z0QnHlDjCFjtllX4DJGkkma8iRdpJva3YmR/vpVuf/nBrjIJju368I70S2w/GTie1+Yopxe50TYBmt1Op1Zkz6cN2pqGjeqI+gOKlUMXbbd0yXN4HMByqgCHMmaF0EjeHS0+nLzbLeN9WspHBPRQZTtKJS30SQzlCtfw10F4ED5WXTwqv7WArrB9VII5q3EseQCKltVviBnHIJJE/VWnRekvOKCBd76su47bizqL8efrPqJbGtPjR9wcmhoNQYnYvUGTSNktlyRwLHve6kbZp0SNtWuOxxq7MbNwlb1PckaDh9CKb7pgG7Iu7HpFiOACF1SsCLujFLZ3buPBoumUkK5/bCNe7I= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 58f570d1-4c26-474e-a06c-08d6281b598f X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2018 03:58:45.0223 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3384 Subject: Re: [dpdk-dev] [PATCH v4 2/4] hash: add extendable bucket feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2018 03:58:47 -0000 >=20 > In use cases that hash table capacity needs to be guaranteed, the extenda= ble > bucket feature can be used to contain extra keys in linked lists when con= flict > happens. This is similar concept to the extendable bucket hash table in p= acket > framework. >=20 > This commit adds the extendable bucket feature. User can turn it on or of= f > through the extra flag field during table creation time. >=20 > Extendable bucket table composes of buckets that can be linked list to cu= rrent > main table. When extendable bucket is enabled, the hash table load can > always acheive 100%. > In other words, the table can always accomodate the same number of keys a= s > the specified table size. This provides 100% table capacity guarantee. > Although keys ending up in the ext buckets may have longer look up time, = they > should be rare due to the cuckoo algorithm. >=20 > Signed-off-by: Yipeng Wang > --- > lib/librte_hash/rte_cuckoo_hash.c | 376 > ++++++++++++++++++++++++++++++++------ > lib/librte_hash/rte_cuckoo_hash.h | 5 + > lib/librte_hash/rte_hash.h | 3 + > 3 files changed, 331 insertions(+), 53 deletions(-) >=20 > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > b/lib/librte_hash/rte_cuckoo_hash.c > index eba13e9..02650b9 100644 > --- a/lib/librte_hash/rte_cuckoo_hash.c > +++ b/lib/librte_hash/rte_cuckoo_hash.c > @@ -31,6 +31,10 @@ > #include "rte_hash.h" > #include "rte_cuckoo_hash.h" >=20 > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > \ > + for (CURRENT_BKT =3D START_BUCKET; = \ > + CURRENT_BKT !=3D NULL; \ > + CURRENT_BKT =3D CURRENT_BKT->next) >=20 > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); >=20 > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > return h; > } >=20 > +static inline struct rte_hash_bucket * > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > + while (lst_bkt->next !=3D NULL) > + lst_bkt =3D lst_bkt->next; > + return lst_bkt; > +} > + > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func) = { > h->cmp_jump_table_idx =3D KEY_CUSTOM; > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > *params) > struct rte_tailq_entry *te =3D NULL; > struct rte_hash_list *hash_list; > struct rte_ring *r =3D NULL; > + struct rte_ring *r_ext =3D NULL; > char hash_name[RTE_HASH_NAMESIZE]; > void *k =3D NULL; > void *buckets =3D NULL; > + void *buckets_ext =3D NULL; > char ring_name[RTE_RING_NAMESIZE]; > + char ext_ring_name[RTE_RING_NAMESIZE]; > unsigned num_key_slots; > unsigned i; > unsigned int hw_trans_mem_support =3D 0, multi_writer_support =3D 0; > + unsigned int ext_table_support =3D 0; > unsigned int readwrite_concur_support =3D 0; >=20 > rte_hash_function default_hash_func =3D (rte_hash_function)rte_jhash; > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > *params) > multi_writer_support =3D 1; > } >=20 > + if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > + ext_table_support =3D 1; > + > /* Store all keys and leave the first entry as a dummy entry for > lookup_bulk */ > if (multi_writer_support) > /* > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > *params) > goto err; > } >=20 > + const uint32_t num_buckets =3D rte_align32pow2(params->entries) / > + RTE_HASH_BUCKET_ENTRIES; > + > + /* Create ring for extendable buckets. */ > + if (ext_table_support) { > + snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > + params- > >name); > + r_ext =3D rte_ring_create(ext_ring_name, > + rte_align32pow2(num_buckets + 1), > + params->socket_id, 0); > + > + if (r_ext =3D=3D NULL) { > + RTE_LOG(ERR, HASH, "ext buckets memory allocation > " > + "failed\n"); > + goto err; > + } > + } > + > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); >=20 > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters > *params) > goto err_unlock; > } >=20 > - const uint32_t num_buckets =3D rte_align32pow2(params->entries) > - / RTE_HASH_BUCKET_ENTRIES; > - > buckets =3D rte_zmalloc_socket(NULL, > num_buckets * sizeof(struct rte_hash_bucket), > RTE_CACHE_LINE_SIZE, params->socket_id); >=20 > if (buckets =3D=3D NULL) { > - RTE_LOG(ERR, HASH, "memory allocation failed\n"); > + RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > goto err_unlock; > } >=20 > + /* Allocate same number of extendable buckets */ > + if (ext_table_support) { > + buckets_ext =3D rte_zmalloc_socket(NULL, > + num_buckets * sizeof(struct rte_hash_bucket), > + RTE_CACHE_LINE_SIZE, params->socket_id); > + if (buckets_ext =3D=3D NULL) { > + RTE_LOG(ERR, HASH, "ext buckets memory allocation > " > + "failed\n"); > + goto err_unlock; > + } > + /* Populate ext bkt ring. We reserve 0 similar to the > + * key-data slot, just in case in future we want to > + * use bucket index for the linked list and 0 means NULL > + * for next bucket > + */ > + for (i =3D 1; i <=3D num_buckets; i++) > + rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i)); > + } > + > const uint32_t key_entry_size =3D sizeof(struct rte_hash_key) + params- > >key_len; > const uint64_t key_tbl_size =3D (uint64_t) key_entry_size * > num_key_slots; >=20 > @@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters > *params) > h->num_buckets =3D num_buckets; > h->bucket_bitmask =3D h->num_buckets - 1; > h->buckets =3D buckets; > + h->buckets_ext =3D buckets_ext; > + h->free_ext_bkts =3D r_ext; > h->hash_func =3D (params->hash_func =3D=3D NULL) ? > default_hash_func : params->hash_func; > h->key_store =3D k; > @@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters > *params) > h->hw_trans_mem_support =3D hw_trans_mem_support; > h->multi_writer_support =3D multi_writer_support; > h->readwrite_concur_support =3D readwrite_concur_support; > + h->ext_table_support =3D ext_table_support; >=20 > #if defined(RTE_ARCH_X86) > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > @@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters > *params) > rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > err: > rte_ring_free(r); > + rte_ring_free(r_ext); > rte_free(te); > rte_free(h); > rte_free(buckets); > + rte_free(buckets_ext); > rte_free(k); > return NULL; > } > @@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h) > rte_free(h->readwrite_lock); > } > rte_ring_free(h->free_slots); > + rte_ring_free(h->free_ext_bkts); > rte_free(h->key_store); > rte_free(h->buckets); > + rte_free(h->buckets_ext); > rte_free(h); > rte_free(te); > } > @@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h) > rte_rwlock_write_lock(h->readwrite_lock); > } >=20 > - > static inline void > __hash_rw_reader_lock(const struct rte_hash *h) { @@ -448,6 +507,14 @@ > rte_hash_reset(struct rte_hash *h) > while (rte_ring_dequeue(h->free_slots, &ptr) =3D=3D 0) > rte_pause(); >=20 > + /* clear free extendable bucket ring and memory */ > + if (h->ext_table_support) { > + memset(h->buckets_ext, 0, h->num_buckets * > + sizeof(struct > rte_hash_bucket)); > + while (rte_ring_dequeue(h->free_ext_bkts, &ptr) =3D=3D 0) > + rte_pause(); > + } > + > /* Repopulate the free slots ring. Entry zero is reserved for key misse= s > */ > if (h->multi_writer_support) > tot_ring_cnt =3D h->entries + (RTE_MAX_LCORE - 1) * @@ - > 458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h) > for (i =3D 1; i < tot_ring_cnt + 1; i++) > rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i)); >=20 > + /* Repopulate the free ext bkt ring. */ > + if (h->ext_table_support) { > + for (i =3D 1; i < h->num_buckets + 1; i++) > + rte_ring_sp_enqueue(h->free_ext_bkts, > + (void *)((uintptr_t) i)); > + } > + > if (h->multi_writer_support) { > /* Reset local caches per lcore */ > for (i =3D 0; i < RTE_MAX_LCORE; i++) > @@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash > *h, > int32_t *ret_val) > { > unsigned int i; > - struct rte_hash_bucket *cur_bkt =3D prim_bkt; > + struct rte_hash_bucket *cur_bkt; > int32_t ret; >=20 > __hash_rw_writer_lock(h); > /* Check if key was inserted after last check but before this > * protected region in case of inserting duplicated keys. > */ > - ret =3D search_and_update(h, data, key, cur_bkt, sig, alt_hash); > + ret =3D search_and_update(h, data, key, prim_bkt, sig, alt_hash); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > *ret_val =3D ret; > return 1; > } > - ret =3D search_and_update(h, data, key, sec_bkt, alt_hash, sig); > - if (ret !=3D -1) { > - __hash_rw_writer_unlock(h); > - *ret_val =3D ret; > - return 1; > + > + FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > + ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > + if (ret !=3D -1) { > + __hash_rw_writer_unlock(h); > + *ret_val =3D ret; > + return 1; > + } > } >=20 > /* Insert new entry if there is room in the primary @@ -580,7 +657,7 > @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h, > int32_t *ret_val) > { > uint32_t prev_alt_bkt_idx; > - struct rte_hash_bucket *cur_bkt =3D bkt; > + struct rte_hash_bucket *cur_bkt; > struct queue_node *prev_node, *curr_node =3D leaf; > struct rte_hash_bucket *prev_bkt, *curr_bkt =3D leaf->bkt; > uint32_t prev_slot, curr_slot =3D leaf_slot; @@ -597,18 +674,20 @@ > rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h, > /* Check if key was inserted after last check but before this > * protected region. > */ > - ret =3D search_and_update(h, data, key, cur_bkt, sig, alt_hash); > + ret =3D search_and_update(h, data, key, bkt, sig, alt_hash); > if (ret !=3D -1) { > __hash_rw_writer_unlock(h); > *ret_val =3D ret; > return 1; > } >=20 > - ret =3D search_and_update(h, data, key, alt_bkt, alt_hash, sig); > - if (ret !=3D -1) { > - __hash_rw_writer_unlock(h); > - *ret_val =3D ret; > - return 1; > + FOR_EACH_BUCKET(cur_bkt, alt_bkt) { > + ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > + if (ret !=3D -1) { > + __hash_rw_writer_unlock(h); > + *ret_val =3D ret; > + return 1; > + } > } >=20 > while (likely(curr_node->prev !=3D NULL)) { @@ -711,15 +790,18 @@ > __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, = { > hash_sig_t alt_hash; > uint32_t prim_bucket_idx, sec_bucket_idx; > - struct rte_hash_bucket *prim_bkt, *sec_bkt; > + struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt; > struct rte_hash_key *new_k, *keys =3D h->key_store; > void *slot_id =3D NULL; > - uint32_t new_idx; > + void *ext_bkt_id =3D NULL; > + uint32_t new_idx, bkt_id; > int ret; > unsigned n_slots; > unsigned lcore_id; > + unsigned int i; > struct lcore_cache *cached_free_slots =3D NULL; > int32_t ret_val; > + struct rte_hash_bucket *last; >=20 > prim_bucket_idx =3D sig & h->bucket_bitmask; > prim_bkt =3D &h->buckets[prim_bucket_idx]; @@ -739,10 +821,12 @@ > __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, > } >=20 > /* Check if key is already inserted in secondary location */ > - ret =3D search_and_update(h, data, key, sec_bkt, alt_hash, sig); > - if (ret !=3D -1) { > - __hash_rw_writer_unlock(h); > - return ret; > + FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > + ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > + if (ret !=3D -1) { > + __hash_rw_writer_unlock(h); > + return ret; > + } > } > __hash_rw_writer_unlock(h); >=20 > @@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct > rte_hash *h, const void *key, > else if (ret =3D=3D 1) { > enqueue_slot_back(h, cached_free_slots, slot_id); > return ret_val; > - } else { > + } > + > + /* if ext table not enabled, we failed the insertion */ > + if (!h->ext_table_support) { > enqueue_slot_back(h, cached_free_slots, slot_id); > return ret; > } > + > + /* Now we need to go through the extendable bucket. Protection is > needed > + * to protect all extendable bucket processes. > + */ > + __hash_rw_writer_lock(h); > + /* We check for duplicates again since could be inserted before the > lock */ > + ret =3D search_and_update(h, data, key, prim_bkt, sig, alt_hash); > + if (ret !=3D -1) { > + enqueue_slot_back(h, cached_free_slots, slot_id); > + goto failure; > + } > + > + FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > + ret =3D search_and_update(h, data, key, cur_bkt, alt_hash, sig); > + if (ret !=3D -1) { > + enqueue_slot_back(h, cached_free_slots, slot_id); > + goto failure; > + } > + } > + > + /* Search sec and ext buckets to find an empty entry to insert. */ > + FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > + for (i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > + /* Check if slot is available */ > + if (likely(cur_bkt->key_idx[i] =3D=3D EMPTY_SLOT)) { > + cur_bkt->sig_current[i] =3D alt_hash; > + cur_bkt->sig_alt[i] =3D sig; > + cur_bkt->key_idx[i] =3D new_idx; > + __hash_rw_writer_unlock(h); > + return new_idx - 1; > + } > + } > + } > + > + /* Failed to get an empty entry from extendable buckets. Link a new > + * extendable bucket. We first get a free bucket from ring. > + */ > + if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) !=3D 0) { > + ret =3D -ENOSPC; > + goto failure; > + } > + > + bkt_id =3D (uint32_t)((uintptr_t)ext_bkt_id) - 1; > + /* Use the first location of the new bucket */ > + (h->buckets_ext[bkt_id]).sig_current[0] =3D alt_hash; > + (h->buckets_ext[bkt_id]).sig_alt[0] =3D sig; > + (h->buckets_ext[bkt_id]).key_idx[0] =3D new_idx; > + /* Link the new bucket to sec bucket linked list */ > + last =3D rte_hash_get_last_bkt(sec_bkt); > + last->next =3D &h->buckets_ext[bkt_id]; > + __hash_rw_writer_unlock(h); > + return new_idx - 1; > + > +failure: > + __hash_rw_writer_unlock(h); > + return ret; > + > } >=20 > int32_t > @@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash > *h, const void *key, { > uint32_t bucket_idx; > hash_sig_t alt_hash; > - struct rte_hash_bucket *bkt; > + struct rte_hash_bucket *bkt, *cur_bkt; > int ret; >=20 > bucket_idx =3D sig & h->bucket_bitmask; > @@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct > rte_hash *h, const void *key, > bkt =3D &h->buckets[bucket_idx]; >=20 > /* Check if key is in secondary location */ > - ret =3D search_one_bucket(h, key, alt_hash, data, bkt); > - if (ret !=3D -1) { > - __hash_rw_reader_unlock(h); > - return ret; > + FOR_EACH_BUCKET(cur_bkt, bkt) { > + ret =3D search_one_bucket(h, key, alt_hash, data, cur_bkt); > + if (ret !=3D -1) { > + __hash_rw_reader_unlock(h); > + return ret; > + } > } > __hash_rw_reader_unlock(h); > return -ENOENT; > @@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct > rte_hash_bucket *bkt, unsigned i) > } > } >=20 > +/* Compact the linked list by moving key from last entry in linked list > +to the > + * empty slot. > + */ > +static inline void > +__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) { > + int i; > + struct rte_hash_bucket *last_bkt; > + > + if (!cur_bkt->next) > + return; > + > + last_bkt =3D rte_hash_get_last_bkt(cur_bkt); > + > + for (i =3D RTE_HASH_BUCKET_ENTRIES - 1; i >=3D 0; i--) { > + if (last_bkt->key_idx[i] !=3D EMPTY_SLOT) { > + cur_bkt->key_idx[pos] =3D last_bkt->key_idx[i]; > + cur_bkt->sig_current[pos] =3D last_bkt->sig_current[i]; > + cur_bkt->sig_alt[pos] =3D last_bkt->sig_alt[i]; > + last_bkt->sig_current[i] =3D NULL_SIGNATURE; > + last_bkt->sig_alt[i] =3D NULL_SIGNATURE; > + last_bkt->key_idx[i] =3D EMPTY_SLOT; > + return; In lock-free algorithm, this will require the global counter increment. > + } > + } > +} > + > /* Search one bucket and remove the matched key */ static inline int32_= t > search_and_remove(const struct rte_hash *h, const void *key, > - struct rte_hash_bucket *bkt, hash_sig_t sig) > + struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos) > { > struct rte_hash_key *k, *keys =3D h->key_store; > unsigned int i; > int32_t ret; >=20 > - /* Check if key is in primary location */ > + /* Check if key is in bucket */ > for (i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > if (bkt->sig_current[i] =3D=3D sig && > bkt->key_idx[i] !=3D EMPTY_SLOT) { > @@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, > const void *key, > if (rte_hash_cmp_eq(key, k->key, h) =3D=3D 0) { > remove_entry(h, bkt, i); >=20 > - /* > - * Return index where key is stored, > + /* Return index where key is stored, > * subtracting the first dummy index > */ > ret =3D bkt->key_idx[i] - 1; > bkt->key_idx[i] =3D EMPTY_SLOT; > + *pos =3D i; > return ret; > } > } > @@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct > rte_hash *h, const void *key, { > uint32_t bucket_idx; > hash_sig_t alt_hash; > - struct rte_hash_bucket *bkt; > - int32_t ret; > + struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt; > + struct rte_hash_bucket *cur_bkt; > + int pos; > + int32_t ret, i; >=20 > bucket_idx =3D sig & h->bucket_bitmask; > - bkt =3D &h->buckets[bucket_idx]; > + prim_bkt =3D &h->buckets[bucket_idx]; >=20 > __hash_rw_writer_lock(h); > /* look for key in primary bucket */ > - ret =3D search_and_remove(h, key, bkt, sig); > + ret =3D search_and_remove(h, key, prim_bkt, sig, &pos); > if (ret !=3D -1) { > - __hash_rw_writer_unlock(h); > - return ret; > + __rte_hash_compact_ll(prim_bkt, pos); > + last_bkt =3D prim_bkt->next; > + prev_bkt =3D prim_bkt; > + goto return_bkt; > } >=20 > /* Calculate secondary hash */ > alt_hash =3D rte_hash_secondary_hash(sig); > bucket_idx =3D alt_hash & h->bucket_bitmask; > - bkt =3D &h->buckets[bucket_idx]; > + sec_bkt =3D &h->buckets[bucket_idx]; > + > + FOR_EACH_BUCKET(cur_bkt, sec_bkt) { > + ret =3D search_and_remove(h, key, cur_bkt, alt_hash, &pos); > + if (ret !=3D -1) { > + __rte_hash_compact_ll(cur_bkt, pos); > + last_bkt =3D sec_bkt->next; > + prev_bkt =3D sec_bkt; > + goto return_bkt; > + } > + } >=20 > - /* look for key in secondary bucket */ > - ret =3D search_and_remove(h, key, bkt, alt_hash); > - if (ret !=3D -1) { > + __hash_rw_writer_unlock(h); > + return -ENOENT; > + > +/* Search last bucket to see if empty to be recycled */ > +return_bkt: > + if (!last_bkt) { > __hash_rw_writer_unlock(h); > return ret; > } > + while (last_bkt->next) { > + prev_bkt =3D last_bkt; > + last_bkt =3D last_bkt->next; > + } Minor: We are trying to find the last bucket here, along with its previous.= May be we can modify 'rte_hash_get_last_bkt' instead? > + > + for (i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > + if (last_bkt->key_idx[i] !=3D EMPTY_SLOT) > + break; > + } > + /* found empty bucket and recycle */ > + if (i =3D=3D RTE_HASH_BUCKET_ENTRIES) { > + prev_bkt->next =3D last_bkt->next =3D NULL; > + uint32_t index =3D last_bkt - h->buckets_ext + 1; > + rte_ring_sp_enqueue(h->free_ext_bkts, (void > *)(uintptr_t)index); In the lock-less algorithm, the bucket cannot be freed immediately. I looke= d at couple of solutions. The bucket needs to be stored internally and shou= ld be associated with the key-store index (or position). I am thinking that= I will add a field to 'struct rte_hash_key' to store the bucket pointer or= index. >>From the code, my understanding is that we will free only the last bucket. = We will never free the middle bucket, please correct me if I am wrong. This= will keep it simple for the lock-free algorithm. I could work through these issues. So, I do not see any issues for lock-fre= e algorithm (as of now :) ). > + } >=20 > __hash_rw_writer_unlock(h); > - return -ENOENT; > + return ret; > } >=20 > int32_t > @@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash > *h, const void **keys, { > uint64_t hits =3D 0; > int32_t i; > + int32_t ret; > uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX]; > uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX]; > const struct rte_hash_bucket > *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX]; > const struct rte_hash_bucket > *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX]; > uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] =3D {0}; > uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] =3D {0}; > + struct rte_hash_bucket *cur_bkt, *next_bkt; >=20 > /* Prefetch first keys */ > for (i =3D 0; i < PREFETCH_OFFSET && i < num_keys; i++) @@ -1266,6 > +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void > **keys, > continue; > } >=20 > + /* all found, do not need to go through ext bkt */ > + if ((hits =3D=3D ((1ULL << num_keys) - 1)) || !h->ext_table_support) { > + if (hit_mask !=3D NULL) > + *hit_mask =3D hits; > + __hash_rw_reader_unlock(h); > + return; > + } > + > + /* need to check ext buckets for match */ > + for (i =3D 0; i < num_keys; i++) { > + if ((hits & (1ULL << i)) !=3D 0) > + continue; > + next_bkt =3D secondary_bkt[i]->next; > + FOR_EACH_BUCKET(cur_bkt, next_bkt) { > + if (data !=3D NULL) > + ret =3D search_one_bucket(h, keys[i], > + sec_hash[i], &data[i], > cur_bkt); > + else > + ret =3D search_one_bucket(h, keys[i], > + sec_hash[i], NULL, cur_bkt); > + if (ret !=3D -1) { > + positions[i] =3D ret; > + hits |=3D 1ULL << i; > + break; > + } > + } > + } > + > __hash_rw_reader_unlock(h); >=20 > if (hit_mask !=3D NULL) > @@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const > void **key, void **data, uint32 >=20 > RETURN_IF_TRUE(((h =3D=3D NULL) || (next =3D=3D NULL)), -EINVAL); >=20 > - const uint32_t total_entries =3D h->num_buckets * > RTE_HASH_BUCKET_ENTRIES; > - /* Out of bounds */ > - if (*next >=3D total_entries) > - return -ENOENT; > + const uint32_t total_entries_main =3D h->num_buckets * > + > RTE_HASH_BUCKET_ENTRIES; > + const uint32_t total_entries =3D total_entries_main << 1; > + > + /* Out of bounds of all buckets (both main table and ext table */ Typo: missing ')' > + if (*next >=3D total_entries_main) > + goto extend_table; >=20 > /* Calculate bucket and index of current iterator */ > bucket_idx =3D *next / RTE_HASH_BUCKET_ENTRIES; @@ -1322,14 > +1559,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, > void **data, uint32 > while (h->buckets[bucket_idx].key_idx[idx] =3D=3D EMPTY_SLOT) { > (*next)++; > /* End of table */ > - if (*next =3D=3D total_entries) { > + if (*next =3D=3D total_entries_main) { > __hash_rw_reader_unlock(h); > - return -ENOENT; > + goto extend_table; > } > bucket_idx =3D *next / RTE_HASH_BUCKET_ENTRIES; > idx =3D *next % RTE_HASH_BUCKET_ENTRIES; > } > - > /* Get position of entry in key table */ > position =3D h->buckets[bucket_idx].key_idx[idx]; > next_key =3D (struct rte_hash_key *) ((char *)h->key_store + @@ - > 1344,4 +1580,38 @@ rte_hash_iterate(const struct rte_hash *h, const void > **key, void **data, uint32 > (*next)++; >=20 > return position - 1; > + > +/* Begin to iterate extendable buckets */ > +extend_table: > + /* Out of total bound or if ext bucket feature is not enabled */ > + if (*next >=3D total_entries || !h->ext_table_support) > + return -ENOENT; > + > + bucket_idx =3D (*next - total_entries_main) / > RTE_HASH_BUCKET_ENTRIES; > + idx =3D (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES; > + > + __hash_rw_reader_lock(h); > + while (h->buckets_ext[bucket_idx].key_idx[idx] =3D=3D EMPTY_SLOT) { > + (*next)++; > + if (*next =3D=3D total_entries) { > + __hash_rw_reader_unlock(h); > + return -ENOENT; > + } > + bucket_idx =3D (*next - total_entries_main) / > + RTE_HASH_BUCKET_ENTRIES; > + idx =3D (*next - total_entries_main) % > RTE_HASH_BUCKET_ENTRIES; > + } > + /* Get position of entry in key table */ > + position =3D h->buckets_ext[bucket_idx].key_idx[idx]; > + next_key =3D (struct rte_hash_key *) ((char *)h->key_store + > + position * h->key_entry_size); > + /* Return key and data */ > + *key =3D next_key->key; > + *data =3D next_key->pdata; > + > + __hash_rw_reader_unlock(h); > + > + /* Increment iterator */ > + (*next)++; > + return position - 1; > } > diff --git a/lib/librte_hash/rte_cuckoo_hash.h > b/lib/librte_hash/rte_cuckoo_hash.h > index fc0e5c2..e601520 100644 > --- a/lib/librte_hash/rte_cuckoo_hash.h > +++ b/lib/librte_hash/rte_cuckoo_hash.h > @@ -142,6 +142,8 @@ struct rte_hash_bucket { > hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES]; >=20 > uint8_t flag[RTE_HASH_BUCKET_ENTRIES]; > + > + void *next; > } __rte_cache_aligned; >=20 > /** A hash table structure. */ > @@ -166,6 +168,7 @@ struct rte_hash { > /**< If multi-writer support is enabled. */ > uint8_t readwrite_concur_support; > /**< If read-write concurrency support is enabled */ > + uint8_t ext_table_support; /**< Enable extendable bucket table */ > rte_hash_function hash_func; /**< Function used to calculate hash. > */ > uint32_t hash_func_init_val; /**< Init value used by hash_func. */ > rte_hash_cmp_eq_t rte_hash_custom_cmp_eq; @@ -184,6 +187,8 > @@ struct rte_hash { > * to the key table. > */ > rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */ > + struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */ > + struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets > +*/ > } __rte_cache_aligned; >=20 > struct queue_node { > diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h inde= x > 9e7d931..11d8e28 100644 > --- a/lib/librte_hash/rte_hash.h > +++ b/lib/librte_hash/rte_hash.h > @@ -37,6 +37,9 @@ extern "C" { > /** Flag to support reader writer concurrency */ #define > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04 >=20 > +/** Flag to indicate the extendabe bucket table feature should be used > +*/ #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08 > + > /** Signature of key that is stored internally. */ typedef uint32_t has= h_sig_t; >=20 > -- > 2.7.4