DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bing Zhao <bingz@mellanox.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: "yipeng1.wang@intel.com" <yipeng1.wang@intel.com>,
	"sameh.gobriel@intel.com" <sameh.gobriel@intel.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	"pablo.de.lara.guarch@intel.com" <pablo.de.lara.guarch@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>
Subject: RE: [dpdk-dev] [RFC] hash: introduce resizable hash list
Date: Tue, 13 Jun 2023 06:34:17 +0000	[thread overview]
Message-ID: <DM4PR12MB5184436AE4BF9989DA6EA892D055A@DM4PR12MB5184.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20230612094420.3a3971c0@hermes.local>

Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, June 13, 2023 12:44 AM
> To: Bing Zhao <bingz@mellanox.com>
> Cc: yipeng1.wang@intel.com; sameh.gobriel@intel.com;
> bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] hash: introduce resizable hash list
> 
> On Wed, 28 Aug 2019 14:51:48 +0800
> Bing Zhao <bingz@mellanox.com> wrote:
> 
> > In the current hash library, there are already two hash tables and
> > several hash calculation functions.
> >
> > FBK hash table is very lightweight, but the key length is limited to
> > 4 bytes and the size of the table is fixed during startup.
> >
> > Cuckoo hash table is a quite great idea and nice implementation in the
> > library, it is efficient and flexible. After some study of the code
> > and information from internet, I find that the table extension is some
> > pain point (correct me if anything wrong). It means that we need to
> > allocate the tables in advance by defining a maximum size.
> > So there is a chance that we waste some unused memory. Right now there
> > is an extendable buckets table, but it seems that the number is also
> > fixed and the same as the primary table.
> > Take the flows information for example, we may support as many as
> > several millions of flows. In some case, only several hundreds of
> > flows will be created but in the other case, millions of flows are
> > needed. So this means that we need to create the hash table to support
> > millions of elements in advance during the startup time. If one key of
> > flow information will consume 64B and 1M flows, so it will occupy more
> > than one hundred MB memory (the table fields included). What is worse
> > if there is only 2M + several elements, it needs to create a table of
> > 4M (or more: depends on the hash collision rate) and it is some
> > wasting of the memory.
> >
> > In order to handle this, an resizable hash list is introduced.
> > The table could start with a small number of the elements and be
> > allocated dynamically during the runtime. In the meanwhile, an
> > on-demand list splitting mechanism is used in order not to make a
> > significant performance degradation. Then there is no need to re-hash
> > and relocate all the existing elements in the table when the table is
> > extended.
> >
> > The brief design is as below:
> > 1. The table is consisted of LIST header array and the LIST entries.
> >    In each entry, a pre-calculated hash signature is stored and is
> >    used to decide which header should it be linked to, by using
> >    "AND" with the mask to select the LSBs of the signature.
> > 2. The header array size could start with a very small number, and a
> >    specified max number of each list is used to check if a table
> >    extension is needed.
> > 3. When the max number on any of list is reached, the header array
> >    size will be doubled. Then each entries linked to this list will
> >    be split into two lists with the new mask (one more bit 1 in
> >    the mask, e.g. 0xfff -> 0x1fff). And a global shift number and
> >    local shift number of each list is used for the further checking.
> > 4. When some other list is being accessed, a comparison for the shift
> >    numbers is used to check if the splitting of this list is needed.
> >    If so, then there will be two conditions:
> >    a. If the local shift number is only 1 less than global or
> >       non-zero, then this list is the one that needs to be split.
> >    b. If more than 1, then it means that the table is extended more
> >       than once. And If the local shift is zero, a mechanism is used
> >       to find the last unsplit list.
> >    And then the list will be split into 2/4/8... lists depends on
> >    the gap. All the entries then will linked to the proper header.
> > So, each time when the hlist APIs are called, only one list will be
> > traversed but not all the lists. And since there is parameter to set a
> > max number of entries in a list. The traversal time is predictable and
> > these will not cause a significant performance degradation.
> >
> > BR. Bing
> >
> >
> > Bing Zhao (1):
> >   rte_hash: introduce hash list into hash lib
> >
> >  lib/librte_hash/Makefile             |   2 +
> >  lib/librte_hash/rte_hash_version.map |  15 +
> >  lib/librte_hash/rte_hlist.c          | 687
> +++++++++++++++++++++++++++++++++++
> >  lib/librte_hash/rte_hlist.h          | 563 ++++++++++++++++++++++++++++
> >  4 files changed, 1267 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_hlist.c  create mode 100644
> > lib/librte_hash/rte_hlist.h
> >
> 
> A resizeable hash table (with RCU?) would be good to have.
> See old article about this https://lwn.net/Articles/573431/ But this patch got
> review feedback and no follow up.
> Marking it as Changes Requested, maybe someone can resuscitate it later.

To my understanding, resizable hash may be needed by the application in the real life. Sometimes the maximal entries may not be known during startup, and it is some waste to allocate a huge amount of memory. (Probably there would be some failure to insert, even)
The extendable hash list would be the simplest way to go. While I am not a researcher, maybe there is some other advanced way to go in the papers and we need to translate it into code. Any ideas?

BR. Bing


      reply	other threads:[~2023-06-14  9:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28  6:51 Bing Zhao
2019-08-28  6:51 ` [dpdk-dev] [RFC] rte_hash: introduce hash list into hash lib Bing Zhao
2019-08-28 11:50   ` Stephen Hemminger
2019-08-28 11:51   ` Stephen Hemminger
2019-08-28 11:53   ` Stephen Hemminger
2019-09-05 19:25 ` [dpdk-dev] [RFC] hash: introduce resizable hash list Wang, Yipeng1
2019-09-12  5:41   ` Bing Zhao
2019-09-21  1:16     ` Wang, Yipeng1
2023-06-12 16:44 ` Stephen Hemminger
2023-06-13  6:34   ` Bing Zhao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM4PR12MB5184436AE4BF9989DA6EA892D055A@DM4PR12MB5184.namprd12.prod.outlook.com \
    --to=bingz@mellanox.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=pablo.de.lara.guarch@intel.com \
    --cc=sameh.gobriel@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=yipeng1.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).