From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B1E6242C50; Mon, 12 Jun 2023 18:44:24 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8E95F410DD; Mon, 12 Jun 2023 18:44:24 +0200 (CEST) Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by mails.dpdk.org (Postfix) with ESMTP id 2D0C240698 for ; Mon, 12 Jun 2023 18:44:23 +0200 (CEST) Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-662b85f4640so3763583b3a.0 for ; Mon, 12 Jun 2023 09:44:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20221208.gappssmtp.com; s=20221208; t=1686588262; x=1689180262; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=L1m1DCV8BXmmNGwMu/QlNYl/mPPe4t/Vf2LIn4Hv5bw=; b=Z7yyiJGl32xgxV4wt91Xwn7orSx9UhMBssfvZDbJ0UU65a0jZ9WCFZRsdY8y94PunR nh7b9bPMq8h5hF36acR2NTE1udQd8JQSlF4DO/9HIV3Ob66a9Gpv+rdRXpOA56BD3ThK etVuBtBHzKhUkk4atRN6GUvvcHjjgBCh0xuMnkKRGGvmBwiTr4dkIHkTrRBIuR08oTQo 2eTegiq0bHVs8tsNDuXrcIeZWAvfEESEC55nhESdcyh63nhscfWnx23//JNY//NU2NUD N3apDis7Xo0h04YyJjp1B/mZLdvD7rDNCuGuRd7x8UGXFt/1Fm4iZWFxm/UvPLCi0TV+ GY/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686588262; x=1689180262; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L1m1DCV8BXmmNGwMu/QlNYl/mPPe4t/Vf2LIn4Hv5bw=; b=RGHplXlFGe6sTX25+vxumJdOeODFCq3KsrIAMH1SyQcgBbf+IxqrNjwYoTHebW8qrL 4Af48KBLGWs3AWEViwi+sLFD8tSRxIVYRQoRe8CBlgsg5UuaqixwhM8C6elc8oBBmXlj qNx3xhCjgGc7ZyvEyb5Zf4gcsxLqyxz01sqloZGtQC65yqi9s6oUj3k64oU0tcwAMU+p c4aCOUKzLMFqVEhDMA9WHy0jta2ooEPygeZoLG8GUcdwJ3vcXvBOxf8l8NzIz/EqNMWC 0xQhqzektCwTWq/Vg5Gyd8Mo3MU0NY9rE4OGUaKmcudjbvAJpQehcHrsoc0rHSuBtQBe u+Cg== X-Gm-Message-State: AC+VfDwkbqjJ+YEubRduoUG/i3drc4A47CFuOppXSr8x2/mtccShxaFl Bv1fpWy401Ux10RXpb7S6QFZyQ== X-Google-Smtp-Source: ACHHUZ54Z3iPOpD/AbbIvvX7HBME8XOgI0e+cH2IrgM4z8XWW3cZr7g9IIq82+nzdl9t2RHBLiSzEg== X-Received: by 2002:a05:6a00:124c:b0:64d:2ea4:9377 with SMTP id u12-20020a056a00124c00b0064d2ea49377mr10899107pfi.14.1686588262225; Mon, 12 Jun 2023 09:44:22 -0700 (PDT) Received: from hermes.local (204-195-120-218.wavecable.com. [204.195.120.218]) by smtp.gmail.com with ESMTPSA id f8-20020aa782c8000000b006635846f3a8sm7208628pfn.91.2023.06.12.09.44.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jun 2023 09:44:22 -0700 (PDT) Date: Mon, 12 Jun 2023 09:44:20 -0700 From: Stephen Hemminger To: Bing Zhao Cc: yipeng1.wang@intel.com, sameh.gobriel@intel.com, bruce.richardson@intel.com, pablo.de.lara.guarch@intel.com, dev@dpdk.org Subject: Re: [dpdk-dev] [RFC] hash: introduce resizable hash list Message-ID: <20230612094420.3a3971c0@hermes.local> In-Reply-To: <1566975109-318949-1-git-send-email-bingz@mellanox.com> References: <1566975109-318949-1-git-send-email-bingz@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, 28 Aug 2019 14:51:48 +0800 Bing Zhao wrote: > In the current hash library, there are already two hash tables and > several hash calculation functions. > > FBK hash table is very lightweight, but the key length is limited to > 4 bytes and the size of the table is fixed during startup. > > Cuckoo hash table is a quite great idea and nice implementation in > the library, it is efficient and flexible. After some study of the > code and information from internet, I find that the table extension > is some pain point (correct me if anything wrong). It means that we > need to allocate the tables in advance by defining a maximum size. > So there is a chance that we waste some unused memory. Right now > there is an extendable buckets table, but it seems that the number > is also fixed and the same as the primary table. > Take the flows information for example, we may support as many as > several millions of flows. In some case, only several hundreds of > flows will be created but in the other case, millions of flows are > needed. So this means that we need to create the hash table to > support millions of elements in advance during the startup time. If > one key of flow information will consume 64B and 1M flows, so it > will occupy more than one hundred MB memory (the table fields > included). What is worse if there is only 2M + several elements, it > needs to create a table of 4M (or more: depends on the hash > collision rate) and it is some wasting of the memory. > > In order to handle this, an resizable hash list is introduced. > The table could start with a small number of the elements and be > allocated dynamically during the runtime. In the meanwhile, an > on-demand list splitting mechanism is used in order not to make a > significant performance degradation. Then there is no need to > re-hash and relocate all the existing elements in the table when the > table is extended. > > The brief design is as below: > 1. The table is consisted of LIST header array and the LIST entries. > In each entry, a pre-calculated hash signature is stored and is > used to decide which header should it be linked to, by using > "AND" with the mask to select the LSBs of the signature. > 2. The header array size could start with a very small number, and a > specified max number of each list is used to check if a table > extension is needed. > 3. When the max number on any of list is reached, the header array > size will be doubled. Then each entries linked to this list will > be split into two lists with the new mask (one more bit 1 in > the mask, e.g. 0xfff -> 0x1fff). And a global shift number and > local shift number of each list is used for the further checking. > 4. When some other list is being accessed, a comparison for the shift > numbers is used to check if the splitting of this list is needed. > If so, then there will be two conditions: > a. If the local shift number is only 1 less than global or > non-zero, then this list is the one that needs to be split. > b. If more than 1, then it means that the table is extended more > than once. And If the local shift is zero, a mechanism is used > to find the last unsplit list. > And then the list will be split into 2/4/8... lists depends on > the gap. All the entries then will linked to the proper header. > So, each time when the hlist APIs are called, only one list will be > traversed but not all the lists. And since there is parameter to set > a max number of entries in a list. The traversal time is predictable > and these will not cause a significant performance degradation. > > BR. Bing > > > Bing Zhao (1): > rte_hash: introduce hash list into hash lib > > lib/librte_hash/Makefile | 2 + > lib/librte_hash/rte_hash_version.map | 15 + > lib/librte_hash/rte_hlist.c | 687 +++++++++++++++++++++++++++++++++++ > lib/librte_hash/rte_hlist.h | 563 ++++++++++++++++++++++++++++ > 4 files changed, 1267 insertions(+) > create mode 100644 lib/librte_hash/rte_hlist.c > create mode 100644 lib/librte_hash/rte_hlist.h > A resizeable hash table (with RCU?) would be good to have. See old article about this https://lwn.net/Articles/573431/ But this patch got review feedback and no follow up. Marking it as Changes Requested, maybe someone can resuscitate it later.