DPDK patches and discussions
 help / color / mirror / Atom feed
From: Pablo de Lara <pablo.de.lara.guarch@intel.com>
To: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v6 7/7] doc: update hash documentation
Date: Sat, 11 Jul 2015 00:30:20 +0100	[thread overview]
Message-ID: <1436571020-16252-8-git-send-email-pablo.de.lara.guarch@intel.com> (raw)
In-Reply-To: <1436571020-16252-1-git-send-email-pablo.de.lara.guarch@intel.com>

Updates hash library documentation, reflecting
the new implementation changes.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 doc/guides/prog_guide/hash_lib.rst | 138 +++++++++++++++++++++++++++++++++----
 doc/guides/prog_guide/index.rst    |   4 ++
 2 files changed, 129 insertions(+), 13 deletions(-)

diff --git a/doc/guides/prog_guide/hash_lib.rst b/doc/guides/prog_guide/hash_lib.rst
index 9b83835..193dd53 100644
--- a/doc/guides/prog_guide/hash_lib.rst
+++ b/doc/guides/prog_guide/hash_lib.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+    Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
     All rights reserved.
 
     Redistribution and use in source and binary forms, with or without
@@ -50,8 +50,6 @@ The hash also allows the configuration of some low-level implementation related
 
 *   Hash function to translate the key into a bucket index
 
-*   Number of entries per bucket
-
 The main methods exported by the hash are:
 
 *   Add entry with key: The key is provided as input. If a new entry is successfully added to the hash for the specified key,
@@ -65,10 +63,26 @@ The main methods exported by the hash are:
 *   Lookup for entry with key: The key is provided as input. If an entry with the specified key is found in the hash (lookup hit),
     then the position of the entry is returned, otherwise (lookup miss) a negative value is returned.
 
-The current hash implementation handles the key management only.
-The actual data associated with each key has to be managed by the user using a separate table that
+Apart from these method explained above, the API allows the user three more options:
+
+*   Add / lookup / delete with key and precomputed hash: Both the key and its precomputed hash are provided as input. This allows
+    the user to perform these operations faster, as hash is already computed.
+
+*   Add / lookup with key and data: A pair of key-value is provided as input. This allows the user to store
+    not only the key, but also data which may be either a 8-byte integer or a pointer to external data (if data size is more than 8 bytes).
+
+*   Combination of the two options above: User can provide key, precomputed hash and data.
+
+Also, the API contains a method to allow the user to look up entries in bursts, achieving higher performance
+than looking up individual entries, as the function prefetches next entries at the time it is operating
+with the first ones, which reduces significantly the impact of the necessary memory accesses.
+Notice that this method uses a pipeline of 8 entries (4 stages of 2 entries), so it is highly recommended
+to use at least 8 entries per burst.
+
+The actual data associated with each key can be either managed by the user using a separate table that
 mirrors the hash in terms of number of entries and position of each entry,
-as shown in the Flow Classification use case describes in the following sections.
+as shown in the Flow Classification use case describes in the following sections,
+or stored in the hash table itself.
 
 The example hash tables in the L2/L3 Forwarding sample applications defines which port to forward a packet to based on a packet flow identified by the five-tuple lookup.
 However, this table could also be used for more sophisticated features and provide many other functions and actions that could be performed on the packets and flows.
@@ -76,17 +90,26 @@ However, this table could also be used for more sophisticated features and provi
 Implementation Details
 ----------------------
 
-The hash table is implemented as an array of entries which is further divided into buckets,
-with the same number of consecutive array entries in each bucket.
-For any input key, there is always a single bucket where that key can be stored in the hash,
-therefore only the entries within that bucket need to be examined when the key is looked up.
+The hash table has two main tables:
+
+* First table is an array of entries which is further divided into buckets,
+  with the same number of consecutive array entries in each bucket. Each entry contains the computed primary
+  and secondary hashes of a given key (explained below), and an index to the second table.
+
+* The second table is an array of all the keys stored in the hash table and its data associated to each key.
+
+The hash library uses the cuckoo hash method to resolve collisions.
+For any input key, there are two possible buckets (primary and secondary/alternative location)
+where that key can be stored in the hash, therefore only the entries within those bucket need to be examined
+when the key is looked up.
 The lookup speed is achieved by reducing the number of entries to be scanned from the total
-number of hash entries down to the number of entries in a hash bucket,
+number of hash entries down to the number of entries in the two hash buckets,
 as opposed to the basic method of linearly scanning all the entries in the array.
 The hash uses a hash function (configurable) to translate the input key into a 4-byte key signature.
 The bucket index is the key signature modulo the number of hash buckets.
-Once the bucket is identified, the scope of the hash add,
-delete and lookup operations is reduced to the entries in that bucket.
+
+Once the buckets are identified, the scope of the hash add,
+delete and lookup operations is reduced to the entries in those buckets (it is very likely that entries are in the primary bucket).
 
 To speed up the search logic within the bucket, each hash entry stores the 4-byte key signature together with the full key for each hash entry.
 For large key sizes, comparing the input key against a key from the bucket can take significantly more time than
@@ -95,6 +118,95 @@ Therefore, the signature comparison is done first and the full key comparison do
 The full key comparison is still necessary, as two input keys from the same bucket can still potentially have the same 4-byte hash signature,
 although this event is relatively rare for hash functions providing good uniform distributions for the set of input keys.
 
+Example of lookup:
+
+First of all, the primary bucket is identified and entry is likely to be stored there.
+If signature was stored there, we compare its key against the one provided and return the position
+where it was stored and/or the data associated to that key if there is a match.
+If signature is not in the primary bucket, the secondary bucket is looked up, where same procedure
+is carried out. If there is no match there either, key is considered not to be in the table.
+
+Example of addition:
+
+Like lookup, the primary and secondary buckets are indentified. If there is an empty slot in
+the primary bucket, primary and secondary signatures are stored in that slot, key and data (if any) are added to
+the second table and an index to the position in the second table is stored in the slot of the first table.
+If there is no space in the primary bucket, one of the entries on that bucket is pushed to its alternative location,
+and the key to be added is inserted in its position.
+To know where the alternative bucket of the evicted entry is, the secondary signature is looked up and alternative bucket index
+is calculated from doing the modulo, as seen above. If there is room in the alternative bucket, the evicted entry
+is stored in it. If not, same process is repeated (one of the entries gets pushed) until a non full bucket is found.
+Notice that despite all the entry movement in the first table, the second table is not touched, which would impact
+greatly in performance.
+
+In the very unlikely event that table enters in a loop where same entries are being evicted indefinitely,
+key is considered not able to be stored.
+With random keys, this method allows the user to get around 90% of the table utilization, without
+having to drop any stored entry (LRU) or allocate more memory (extended buckets).
+
+Entry distribution in hash table
+--------------------------------
+
+As mentioned above, Cuckoo hash implementation pushes elements out of their bucket,
+if there is a new entry to be added which primary location coincides with their current bucket,
+being pushed to their alternative location.
+Therefore, as user adds more entries to the hash table, distribution of the hash values
+in the buckets will change, being most of them in their primary location and a few in
+their secondary location, which the later will increase, as table gets busier.
+This information is quite useful, as performance will be lower as more entries
+are evicted to their secondary location.
+
+See the tables below showing entry distribution as table utilization increases.
+
+.. _table_hash_lib_1:
+
+.. table:: Entry distribution with 1024 entries
+
+   +--------------+-----------------------+-------------------------+
+   | % Table used | % In Primary location | % In Secondary location |
+   +==============+=======================+=========================+
+   |      25      |         100           |           0             |
+   +--------------+-----------------------+-------------------------+
+   |      50      |         96.1          |           3.9           |
+   +--------------+-----------------------+-------------------------+
+   |      75      |         88.2          |           11.8          |
+   +--------------+-----------------------+-------------------------+
+   |      80      |         86.3          |           13.7          |
+   +--------------+-----------------------+-------------------------+
+   |      85      |         83.1          |           16.9          |
+   +--------------+-----------------------+-------------------------+
+   |      90      |         77.3          |           22.7          |
+   +--------------+-----------------------+-------------------------+
+   |      95.8    |         64.5          |           35.5          |
+   +--------------+-----------------------+-------------------------+
+
+|
+
+.. _table_hash_lib_2:
+
+.. table:: Entry distribution with 1 million entries
+
+   +--------------+-----------------------+-------------------------+
+   | % Table used | % In Primary location | % In Secondary location |
+   +==============+=======================+=========================+
+   |      50      |         96            |           4             |
+   +--------------+-----------------------+-------------------------+
+   |      75      |         86.9          |           13.1          |
+   +--------------+-----------------------+-------------------------+
+   |      80      |         83.9          |           16.1          |
+   +--------------+-----------------------+-------------------------+
+   |      85      |         80.1          |           19.9          |
+   +--------------+-----------------------+-------------------------+
+   |      90      |         74.8          |           25.2          |
+   +--------------+-----------------------+-------------------------+
+   |      94.5    |         67.4          |           32.6          |
+   +--------------+-----------------------+-------------------------+
+
+.. note::
+
+   Last values on the tables above are the average maximum table
+   utilization with random keys and using Jenkins hash function.
+
 Use Case: Flow Classification
 -----------------------------
 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 3295661..036640c 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -241,3 +241,7 @@ Programmer's Guide
 :numref:`table_qos_33` :ref:`table_qos_33`
 
 :numref:`table_qos_34` :ref:`table_qos_34`
+
+:numref:`table_hash_lib_1` :ref:`table_hash_lib_1`
+
+:numref:`table_hash_lib_2` :ref:`table_hash_lib_2`
-- 
2.4.3

  parent reply	other threads:[~2015-07-10 23:30 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05 14:33 [dpdk-dev] [PATCH 0/6] Cuckoo hash Pablo de Lara
2015-06-05 14:33 ` [dpdk-dev] [PATCH 1/6] eal: add const in prefetch functions Pablo de Lara
2015-06-16 15:10   ` Bruce Richardson
2015-06-05 14:33 ` [dpdk-dev] [PATCH 2/6] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-06-17 15:31   ` Bruce Richardson
2015-06-18  9:50   ` Bruce Richardson
2015-06-05 14:33 ` [dpdk-dev] [PATCH 3/6] hash: add new lookup_bulk_with_hash function Pablo de Lara
2015-06-05 14:33 ` [dpdk-dev] [PATCH 4/6] hash: add new functions rte_hash_rehash and rte_hash_reset Pablo de Lara
2015-06-05 14:33 ` [dpdk-dev] [PATCH 5/6] hash: add new functionality to store data in hash table Pablo de Lara
2015-06-05 14:33 ` [dpdk-dev] [PATCH 6/6] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-06-16 13:44 ` [dpdk-dev] [PATCH 0/6] Cuckoo hash Thomas Monjalon
2015-06-16 21:44   ` De Lara Guarch, Pablo
2015-06-25 22:05 ` [dpdk-dev] [PATCH v2 00/11] " Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 01/11] eal: add const in prefetch functions Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 02/11] hash: move rte_hash structure to C file and make it internal Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 03/11] test/hash: enhance hash unit tests Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 04/11] test/hash: rename new hash perf unit test back to original name Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 05/11] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 06/11] hash: add new lookup_bulk_with_hash function Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 07/11] hash: add new function rte_hash_reset Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 08/11] hash: add new functionality to store data in hash table Pablo de Lara
2015-06-26 16:49     ` Stephen Hemminger
2015-06-28 22:23       ` De Lara Guarch, Pablo
2015-06-30  7:36         ` Stephen Hemminger
2015-07-10 10:39         ` Bruce Richardson
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 09/11] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 10/11] doc: announce ABI change of librte_hash Pablo de Lara
2015-06-25 22:05   ` [dpdk-dev] [PATCH v2 11/11] doc: update hash documentation Pablo de Lara
2015-06-28 22:25   ` [dpdk-dev] [PATCH v3 00/11] Cuckoo hash Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 01/11] eal: add const in prefetch functions Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 02/11] hash: move rte_hash structure to C file and make it internal Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 03/11] test/hash: enhance hash unit tests Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 04/11] test/hash: rename new hash perf unit test back to original name Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 05/11] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 06/11] hash: add new lookup_bulk_with_hash function Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 07/11] hash: add new function rte_hash_reset Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 08/11] hash: add new functionality to store data in hash table Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 09/11] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 10/11] doc: announce ABI change of librte_hash Pablo de Lara
2015-06-28 22:25     ` [dpdk-dev] [PATCH v3 11/11] doc: update hash documentation Pablo de Lara
2015-07-08 23:23     ` [dpdk-dev] [PATCH v3 00/11] Cuckoo hash Thomas Monjalon
2015-07-09  8:02       ` Bruce Richardson
2015-07-10 17:24     ` [dpdk-dev] [PATCH v4 0/7] Cuckoo hash - part 3 of " Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 1/7] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 2/7] hash: add new function rte_hash_reset Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 3/7] hash: add new functionality to store data in hash table Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 4/7] hash: add iterate function Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 5/7] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 6/7] doc: announce ABI change of librte_hash Pablo de Lara
2015-07-10 17:24       ` [dpdk-dev] [PATCH v4 7/7] doc: update hash documentation Pablo de Lara
2015-07-10 20:52       ` [dpdk-dev] [PATCH v4 0/7] Cuckoo hash - part 3 of Cuckoo hash Bruce Richardson
2015-07-10 21:57       ` [dpdk-dev] [PATCH v5 " Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 1/7] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 2/7] hash: add new function rte_hash_reset Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 3/7] hash: add new functionality to store data in hash table Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 4/7] hash: add iterate function Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 5/7] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 6/7] doc: announce ABI change of librte_hash Pablo de Lara
2015-07-10 21:57         ` [dpdk-dev] [PATCH v5 7/7] doc: update hash documentation Pablo de Lara
2015-07-10 22:52         ` [dpdk-dev] [PATCH v5 0/7] Cuckoo hash - part 3 of Cuckoo hash Bruce Richardson
2015-07-10 23:30         ` [dpdk-dev] [PATCH v6 " Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 1/7] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 2/7] hash: add new function rte_hash_reset Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 3/7] hash: add new functionality to store data in hash table Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 4/7] hash: add iterate function Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 5/7] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-07-10 23:30           ` [dpdk-dev] [PATCH v6 6/7] doc: announce ABI change of librte_hash Pablo de Lara
2015-07-10 23:30           ` Pablo de Lara [this message]
2015-07-11  0:18           ` [dpdk-dev] [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash Pablo de Lara
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 1/7] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-07-12 22:29               ` Thomas Monjalon
2015-07-13 16:11                 ` Bruce Richardson
2015-07-13 16:14                   ` Bruce Richardson
2015-07-13 16:20                     ` Thomas Monjalon
2015-07-13 16:26                       ` Bruce Richardson
2015-07-16  9:39               ` Tony Lu
2015-07-16 20:42                 ` De Lara Guarch, Pablo
2015-07-17  3:34                   ` Tony Lu
2015-07-17  7:34                     ` De Lara Guarch, Pablo
2015-07-17  7:58                       ` Tony Lu
2015-07-17  9:06                         ` De Lara Guarch, Pablo
2015-07-17 14:39                           ` Tony Lu
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 2/7] hash: add new function rte_hash_reset Pablo de Lara
2015-07-12 22:16               ` Thomas Monjalon
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 3/7] hash: add new functionality to store data in hash table Pablo de Lara
2015-07-12 22:00               ` Thomas Monjalon
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 4/7] hash: add iterate function Pablo de Lara
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 5/7] MAINTAINERS: claim responsability for hash library Pablo de Lara
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 6/7] doc: announce ABI change of librte_hash Pablo de Lara
2015-07-12 22:38               ` Thomas Monjalon
2015-07-11  0:18             ` [dpdk-dev] [PATCH v7 7/7] doc: update hash documentation Pablo de Lara
2015-07-12 22:46             ` [dpdk-dev] [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1436571020-16252-8-git-send-email-pablo.de.lara.guarch@intel.com \
    --to=pablo.de.lara.guarch@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).