From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4239CA0613 for ; Wed, 28 Aug 2019 10:18:35 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1E7581C209; Wed, 28 Aug 2019 10:18:35 +0200 (CEST) Received: from git-send-mailer.rdmz.labs.mlnx (unknown [37.142.13.130]) by dpdk.org (Postfix) with ESMTP id 4D95E1C1F7 for ; Wed, 28 Aug 2019 08:52:00 +0200 (CEST) From: Bing Zhao To: yipeng1.wang@intel.com, sameh.gobriel@intel.com, bruce.richardson@intel.com, pablo.de.lara.guarch@intel.com Cc: dev@dpdk.org, bingz@mellanox.com Date: Wed, 28 Aug 2019 14:51:48 +0800 Message-Id: <1566975109-318949-1-git-send-email-bingz@mellanox.com> X-Mailer: git-send-email 1.8.3.1 X-Mailman-Approved-At: Wed, 28 Aug 2019 10:18:34 +0200 Subject: [dpdk-dev] [RFC] hash: introduce resizable hash list X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In the current hash library, there are already two hash tables and several hash calculation functions. FBK hash table is very lightweight, but the key length is limited to 4 bytes and the size of the table is fixed during startup. Cuckoo hash table is a quite great idea and nice implementation in the library, it is efficient and flexible. After some study of the code and information from internet, I find that the table extension is some pain point (correct me if anything wrong). It means that we need to allocate the tables in advance by defining a maximum size. So there is a chance that we waste some unused memory. Right now there is an extendable buckets table, but it seems that the number is also fixed and the same as the primary table. Take the flows information for example, we may support as many as several millions of flows. In some case, only several hundreds of flows will be created but in the other case, millions of flows are needed. So this means that we need to create the hash table to support millions of elements in advance during the startup time. If one key of flow information will consume 64B and 1M flows, so it will occupy more than one hundred MB memory (the table fields included). What is worse if there is only 2M + several elements, it needs to create a table of 4M (or more: depends on the hash collision rate) and it is some wasting of the memory. In order to handle this, an resizable hash list is introduced. The table could start with a small number of the elements and be allocated dynamically during the runtime. In the meanwhile, an on-demand list splitting mechanism is used in order not to make a significant performance degradation. Then there is no need to re-hash and relocate all the existing elements in the table when the table is extended. The brief design is as below: 1. The table is consisted of LIST header array and the LIST entries. In each entry, a pre-calculated hash signature is stored and is used to decide which header should it be linked to, by using "AND" with the mask to select the LSBs of the signature. 2. The header array size could start with a very small number, and a specified max number of each list is used to check if a table extension is needed. 3. When the max number on any of list is reached, the header array size will be doubled. Then each entries linked to this list will be split into two lists with the new mask (one more bit 1 in the mask, e.g. 0xfff -> 0x1fff). And a global shift number and local shift number of each list is used for the further checking. 4. When some other list is being accessed, a comparison for the shift numbers is used to check if the splitting of this list is needed. If so, then there will be two conditions: a. If the local shift number is only 1 less than global or non-zero, then this list is the one that needs to be split. b. If more than 1, then it means that the table is extended more than once. And If the local shift is zero, a mechanism is used to find the last unsplit list. And then the list will be split into 2/4/8... lists depends on the gap. All the entries then will linked to the proper header. So, each time when the hlist APIs are called, only one list will be traversed but not all the lists. And since there is parameter to set a max number of entries in a list. The traversal time is predictable and these will not cause a significant performance degradation. BR. Bing Bing Zhao (1): rte_hash: introduce hash list into hash lib lib/librte_hash/Makefile | 2 + lib/librte_hash/rte_hash_version.map | 15 + lib/librte_hash/rte_hlist.c | 687 +++++++++++++++++++++++++++++++++++ lib/librte_hash/rte_hlist.h | 563 ++++++++++++++++++++++++++++ 4 files changed, 1267 insertions(+) create mode 100644 lib/librte_hash/rte_hlist.c create mode 100644 lib/librte_hash/rte_hlist.h -- 1.8.3.1