From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 29038A0524; Mon, 19 Apr 2021 18:00:40 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8FB27413A9; Mon, 19 Apr 2021 18:00:38 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id 68097413A9 for ; Mon, 19 Apr 2021 18:00:35 +0200 (CEST) IronPort-SDR: /hikIFu8nFcasDCrYYwfCX15lTyjg1qRzK4otGufD15GcHPuU567MUdoA0Tcx/yKGdMLdFDN7+ 0bD7hl2cSbQA== X-IronPort-AV: E=McAfee;i="6200,9189,9959"; a="193228655" X-IronPort-AV: E=Sophos;i="5.82,234,1613462400"; d="scan'208";a="193228655" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2021 09:00:12 -0700 IronPort-SDR: 0xGPL9L48HT+xCd4twk7zx+JaJctrEj8JIlX5Kzwt6TUhad3j03m6be7vYT0w24xsxKaFHYSMe 4tgLuBjrpDCg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,234,1613462400"; d="scan'208";a="462785615" Received: from silpixa00400072.ir.intel.com ([10.237.222.213]) by orsmga001.jf.intel.com with ESMTP; 19 Apr 2021 09:00:08 -0700 From: Vladimir Medvedkin To: dev@dpdk.org Cc: konstantin.ananyev@intel.com, andrey.chilikin@intel.com, ray.kinsella@intel.com, yipeng1.wang@intel.com, sameh.gobriel@intel.com, bruce.richardson@intel.com, john.mcnamara@intel.com Date: Mon, 19 Apr 2021 16:59:54 +0100 Message-Id: <1618847995-91229-5-git-send-email-vladimir.medvedkin@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1618847995-91229-1-git-send-email-vladimir.medvedkin@intel.com> References: <1618847995-91229-1-git-send-email-vladimir.medvedkin@intel.com> MIME-Version: 1.0 In-Reply-To: <1618319973-391016-1-git-send-email-vladimir.medvedkin@intel.com> References: <1618319973-391016-1-git-send-email-vladimir.medvedkin@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v5 4/5] doc: add thash documentation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Adds documentation for the Toeplitz hash library Signed-off-by: Vladimir Medvedkin Reviewed-by: Konstantin Ananyev Reviewed-by: John McNamara --- doc/guides/prog_guide/img/predictable_snat_1.svg | 1444 +++++++++++++++++++++ doc/guides/prog_guide/img/predictable_snat_2.svg | 1444 +++++++++++++++++++++ doc/guides/prog_guide/img/rss_queue_assign.svg | 1454 ++++++++++++++++++++++ doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/toeplitz_hash_lib.rst | 289 +++++ doc/guides/rel_notes/release_21_05.rst | 6 + 6 files changed, 4638 insertions(+) create mode 100644 doc/guides/prog_guide/img/predictable_snat_1.svg create mode 100644 doc/guides/prog_guide/img/predictable_snat_2.svg create mode 100644 doc/guides/prog_guide/img/rss_queue_assign.svg create mode 100644 doc/guides/prog_guide/toeplitz_hash_lib.rst diff --git a/doc/guides/prog_guide/img/predictable_snat_1.svg b/doc/guides/prog_guide/img/predictable_snat_1.svg new file mode 100644 index 0000000..5f97ccb --- /dev/null +++ b/doc/guides/prog_guide/img/predictable_snat_1.svg @@ -0,0 +1,1444 @@ + + + +image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-4 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Router.1001 + + Sheet.1002 + + + + Sheet.1003 + + + + Sheet.1004 + + + + Sheet.1005 + + + + Sheet.1006 + + + + Sheet.1007 + + + + Sheet.1008 + + + + Sheet.1009 + + + + + + + + Array.1011 + + Sheet.1012 + + + + Sheet.1013 + + + + Sheet.1014 + + + + Sheet.1015 + + + + Sheet.1016 + 443 + + + + 443 + + + Sheet.1017 + 10000 + + + + 10000 + + + Sheet.1018 + 192.0.2.100 + + + + 192.0.2.100 + + + Sheet.1019 + 10.10.10.10 + + + + 10.10.10.10 + + + + + + + Array.1020 + + Sheet.1021 + + + + Sheet.1022 + + + + Sheet.1023 + + + + Sheet.1024 + + + + Sheet.1025 + 443 + + + + 443 + + + Sheet.1026 + 12345 + + + + 12345 + + + Sheet.1027 + 192.0.2.100 + + + + 192.0.2.100 + + + Sheet.1028 + 172.16.0.20 + + + + 172.16.0.20 + + + + Dynamic connector.1029 + + + + + + + + + + Array.1030 + + Sheet.1031 + + + + Sheet.1032 + + + + Sheet.1033 + + + + Sheet.1034 + + + + Sheet.1035 + 12345 + + + + 12345 + + + Sheet.1036 + 443 + + + + 443 + + + Sheet.1037 + 172.16.0.20 + + + + 172.16.0.20 + + + Sheet.1038 + 192.0.2.100 + + + + 192.0.2.100 + + + + + + + Array.1039 + + Sheet.1040 + + + + Sheet.1041 + + + + Sheet.1042 + + + + Sheet.1043 + + + + Sheet.1044 + 10000 + + + + 10000 + + + Sheet.1045 + 443 + + + + 443 + + + Sheet.1046 + 10.10.10.10 + + + + 10.10.10.10 + + + Sheet.1047 + 192.0.2.100 + + + + 192.0.2.100 + + + + Dynamic connector.1048 + + + + + + + Sheet.1049 + RSS hash value 0xdeadbeef Packet assigned to queue 15 + + + + RSS hash value 0xdeadbeefPacket assigned to queue 15 + + + Sheet.1051 + RSS hash value 0xbadcab1e Packet assigned to queue 14 + + + + RSS hash value 0xbadcab1ePacket assigned to queue 14 + + + \ No newline at end of file diff --git a/doc/guides/prog_guide/img/predictable_snat_2.svg b/doc/guides/prog_guide/img/predictable_snat_2.svg new file mode 100644 index 0000000..8525459 --- /dev/null +++ b/doc/guides/prog_guide/img/predictable_snat_2.svg @@ -0,0 +1,1444 @@ + + + +image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-5 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Router.1001 + + Sheet.1002 + + + + Sheet.1003 + + + + Sheet.1004 + + + + Sheet.1005 + + + + Sheet.1006 + + + + Sheet.1007 + + + + Sheet.1008 + + + + Sheet.1009 + + + + + + + + Array.1011 + + Sheet.1012 + + + + Sheet.1013 + + + + Sheet.1014 + + + + Sheet.1015 + + + + Sheet.1016 + 443 + + + + 443 + + + Sheet.1017 + 10000 + + + + 10000 + + + Sheet.1018 + 192.0.2.100 + + + + 192.0.2.100 + + + Sheet.1019 + 10.10.10.10 + + + + 10.10.10.10 + + + + + + + Array.1020 + + Sheet.1021 + + + + Sheet.1022 + + + + Sheet.1023 + + + + Sheet.1024 + + + + Sheet.1025 + 443 + + + + 443 + + + Sheet.1026 + 23456 + + + + 23456 + + + Sheet.1027 + 192.0.2.100 + + + + 192.0.2.100 + + + Sheet.1028 + 172.16.0.20 + + + + 172.16.0.20 + + + + Dynamic connector.1029 + + + + + + + + + + Array.1030 + + Sheet.1031 + + + + Sheet.1032 + + + + Sheet.1033 + + + + Sheet.1034 + + + + Sheet.1035 + 23456 + + + + 23456 + + + Sheet.1036 + 443 + + + + 443 + + + Sheet.1037 + 172.16.0.20 + + + + 172.16.0.20 + + + Sheet.1038 + 192.0.2.100 + + + + 192.0.2.100 + + + + + + + Array.1039 + + Sheet.1040 + + + + Sheet.1041 + + + + Sheet.1042 + + + + Sheet.1043 + + + + Sheet.1044 + 10000 + + + + 10000 + + + Sheet.1045 + 443 + + + + 443 + + + Sheet.1046 + 10.10.10.10 + + + + 10.10.10.10 + + + Sheet.1047 + 192.0.2.100 + + + + 192.0.2.100 + + + + Dynamic connector.1048 + + + + + + + Sheet.1049 + RSS hash value 0xdeadbeef Packet assigned to queue 15 + + + + RSS hash value 0xdeadbeefPacket assigned to queue 15 + + + Sheet.1051 + RSS hash value 0xf00d1eaf Packet assigned to queue 15 + + + + RSS hash value 0xf00d1eafPacket assigned to queue 15 + + + \ No newline at end of file diff --git a/doc/guides/prog_guide/img/rss_queue_assign.svg b/doc/guides/prog_guide/img/rss_queue_assign.svg new file mode 100644 index 0000000..d0eef8c --- /dev/null +++ b/doc/guides/prog_guide/img/rss_queue_assign.svg @@ -0,0 +1,1454 @@ + + + +image/svg+xml + + + + + + + + + + + + + + + + + + Page-6 + + + + + + + Array.1000 + + Sheet.1001 + + + + Sheet.1002 + + + + Sheet.1003 + + + + Sheet.1004 + + + + Sheet.1005 + + + + Sheet.1006 + Src/Dst ports + + + + Src/Dst ports + + + Sheet.1007 + Dst_ip + + + + Dst_ip + + + Sheet.1008 + Src_ip + + + + Src_ip + + + + + + + Data block.1009 + Received Packet Data + + Sheet.1010 + + + + Sheet.1011 + + + + Sheet.1012 + + + + + + Received Packet Data + + + + Simple Arrow.1013 + + + + + + + + Sheet.1014 + Parser extracts required fields + + + + Parser extracts required fields + + + Sheet.1015 + tuple + + + + tuple + + + Simple Arrow.1016 + + + + + + + + Rectangle.1017 + Toeplitz hash function + + + + + + + Toeplitz hash function + + + Simple Arrow.1018 + + + + + + + + Byte or variable.1019 + + + + + + + Byte or variable.1020 + LSB + + + + + + + LSB + + + Sheet.1021 + 32-bit hash value + + + + 32-bit hash value + + + Simple Arrow.1022 + + + + + + + + + + + Stack or heap.1023 + + Sheet.1024 + + + + Sheet.1025 + + + + Sheet.1026 + + + + Sheet.1027 + + + + Sheet.1028 + + + + Sheet.1029 + + + + Sheet.1030 + + + + Sheet.1031 + + + + Sheet.1032 + + + + Sheet.1033 + Q_idx_0 + + + + Q_idx_0 + + + Sheet.1034 + Q_idx_1 + + + + Q_idx_1 + + + Sheet.1035 + Q_idx_2 + + + + Q_idx_2 + + + Sheet.1036 + Q_idx_3 + + + + Q_idx_3 + + + Sheet.1037 + ... + + + + ... + + + Sheet.1038 + ... + + + + ... + + + Sheet.1039 + Q_idx_n + + + + Q_idx_n + + + + Sheet.1040 + RSS Redirection Table + + + + RSS Redirection Table + + + Sheet.1041 + Hash LSB’s are used as an index in table + + + + Hash LSB’s are used as an index in table + + + Rectangle.1043 + CPU 0 + + + + + + + CPU 0 + + + Rectangle.1044 + CPU 1 + + + + + + + CPU 1 + + + Rectangle.1045 + CPU 2 + + + + + + + CPU 2 + + + Rectangle.1046 + CPU 3 + + + + + + + CPU 3 + + + Dynamic connector.1047 + + + + + + + Rectangle.1048 + CPU 4 + + + + + + + CPU 4 + + + Rectangle.1049 + CPU 5 + + + + + + + CPU 5 + + + Simple Arrow.1050 + + + + + + + + \ No newline at end of file diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index 45c7dec..2dce507 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -32,6 +32,7 @@ Programmer's Guide link_bonding_poll_mode_drv_lib timer_lib hash_lib + toeplitz_hash_lib efd_lib member_lib lpm_lib diff --git a/doc/guides/prog_guide/toeplitz_hash_lib.rst b/doc/guides/prog_guide/toeplitz_hash_lib.rst new file mode 100644 index 0000000..fcaab6b --- /dev/null +++ b/doc/guides/prog_guide/toeplitz_hash_lib.rst @@ -0,0 +1,289 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2021 Intel Corporation. + +.. _Thash_Library: + +Toeplitz Hash Library +===================== + +DPDK provides a Toeplitz Hash Library to calculate the Toeplitz hash function +and to use its properties. The Toeplitz hash function is commonly used in a +wide range of NICs to calculate the RSS hash sum to spread the traffic among +the queues. + +.. _figure_rss_queue_assign: + +.. figure:: img/rss_queue_assign.* + + RSS queue assignment example + + +Toeplitz hash function API +-------------------------- + +There are two functions that provide calculation of the Toeplitz hash sum: + +* rte_softrss() + +* rte_softrss_be() + +Both of these functions take the parameters: + +* A pointer to the tuple, containing fields extracted from the packet. + +* A length of this tuple counted in double words. + +* A pointer to the RSS hash key corresponding to the one installed on the NIC. + +Both functions expect the tuple to be in "host" byte order and a multiple of 4 +bytes in length. The ``rte_softrss()`` function expects the ``rss_key`` to be +exactly the same as the one installed on the NIC. The ``rte_softrss_be`` +function is a faster implementation, but it expects ``rss_key`` to be +converted to the host byte order. + +Predictable RSS +--------------- + +In some usecases it is useful to have a way to find partial collisions of the +Toeplitz hash function. In figure :numref:`figure_rss_queue_assign` only a few +of the least significant bits (LSB) of the hash value are used to indicate an +entry in the RSS Redirection Table (ReTa) and thus the index of the queue. So, +in this case it would be useful to find another tuple whose hash has the same +LSB's as the hash from the original tuple. + +For example: + +- In the case of SNAT (Source Network Address Translation) it is possible to + find a special source port number on translation so that the hash of + returning packets, of the given connection, will have desired LSB's. +- In the case of MPLS (Multiprotocol Label Switching), if the MPLS tag is used + in the hash calculation, the Label Switching router can allocate a special + MPLS tag to bind an LSP (Label Switching Path) to a given queue. This method + can be used with the allocation of IPSec SPI, VXLan VNI, etc., to bind the + tunnel to the desired queue. +- In the case of a TCP stack, a special source port could be chosen for + outgoing connections, such that the response packets will be assigned to the + desired queue. + +This functionality is provided by the API shown below. The API consists of 3 +parts: + +* Create the thash context. + +* Create the thash helper, associated with a context. + +* Use the helper run time to calculate the adjustable bits of the tuple to + ensure a collision. + +Thash context +~~~~~~~~~~~~~ + +The function ``rte_thash_init_ctx()`` initializes the context struct +associated with a particular NIC or a set of NICs + +It expects: + +* The log2 value of the size of the RSS redirection table for the + corresponding NIC. It reflects the number of least significant bits of the + hash value to produce a collision for. + +* A predefined RSS hash key. This is optional, if ``NULL`` then a random key + will be initialized. + +* The length of the RSS hash key. This value is usually hardware/driver + specific and can be found in the NIC datasheet. + +* Optional flags, as shown below. + +Supported flags: + +* ``RTE_THASH_IGNORE_PERIOD_OVERFLOW`` - By default, and for security reasons, + the library prohibits generating a repeatable sequence in the hash key. This + flag disables such checking. The flag is mainly used for testing in the lab + to generate an RSS hash key with a uniform hash distribution, if the input + traffic also has a uniform distribution. + +* ``RTE_THASH_MINIMAL_SEQ`` - By default, the library generates a special bit + sequence in the hash key for all the bits of the subtuple. However, the + collision generation task requires only the ``log2(RETA_SZ)`` bits in the + subtuple. This flag forces the minimum bit sequence in the hash key to be + generated for the required ``log2(RETA_SZ)`` least significant bits of the + subtuple. The flag can be used in the case of a relatively large number of + helpers that may overlap with their corresponding bit sequences of RSS hash + keys. + + +Thash helper +~~~~~~~~~~~~ + +The function ``rte_thash_add_helper()`` initializes the helper struct +associated with a given context and a part of a target tuple of interest which +could be altered to produce a hash collision. On success it writes a specially +calculated bit sequence into the RSS hash key which is stored in the context +and calculates a table with values to be XORed with a subtuple. + +It expects: + +* A pointer to the Thash context to be associated with. + +* A length of the subtuple to be modified. The length is counted in bits. + +* An offset of the subtuple to be modified from the beginning of the tuple. It + is also counted in bits. + +.. note:: + + Adding a helper changes the key stored in the corresponding context. So the + updated RSS hash key must be uploaded into the NIC after creating all the + required helpers. + + +Calculation of the complementary bits to adjust the subtuple +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``rte_thash_get_complement()`` function returns a special bit sequence +with length ``N = log2(rss_reta_sz)`` (for the ``rss_reta_sz`` provided at +context initialization) to be xored with N least significant bits of the +subtuple. + +It expects: + +* A corresponding helper created for a given subtuple of the tuple. + +* A hash value of the tuple we want to alter. + +* The desired LSB's of the hash value the user expects to have. + +After the returned bit sequence has been XORed with the subtuple, the resulted +LSB's of the new hash value, calculated from the altered tuple, will be the +same as in ``desired_hash``. + + +Adjust tuple API +~~~~~~~~~~~~~~~~~ + +The ``rte_thash_get_complement()`` function is a user-friendly wrapper around +a number of other functions. It alters a passed tuple to meet the above +mentioned requirements around the desired hash LSB's. + +It expects: + +* A Thash context and helper. + +* A pointer to the tuple to be changed. + +* The length of the tuple. + +* A callback function and its userdata to check the tuple after it has been + changed. + +* The number of attempts to change the tuple. Basically, it makes sense if + there is a callback and a limit on the number of attempts to change the + tuple, if the callback function returns an error. + + +Usecase example +--------------- + +There could be a number of different usecases, such as NAT, TCP stack, MPLS +tag allocation, etc. In the following we will consider a SNAT application. + +Packets of a single bidirectional flow belonging to different directions can +end up being assigned to different queues and thus processed by different +lcores, as shown in :numref:`figure_predictable_snat_1`: + +.. _figure_predictable_snat_1: + +.. figure:: img/predictable_snat_1.* + + Bidirectional flow packets distribution in general + +That leads to a situation where the same packet flow can be shared between two +cores. Such a situation is not ideal from a performance perspective and +requires extra synchronization efforts that might lead to various performance +penalties, for example: + +* The connections table is global so locking/RCU on the flow insertion/removal + is required. + +* Connection metadata must be protected to avoid race conditions. + +* More cache pressure if a single connection metadata is kept in different + L1/L2 caches of a different CPU core. + +* Cache pressure/less cache locality on packet handover to the different cores. + +We can avoid all these penalties if it can be guaranteed that packets +belonging to one bidirectional flow will be assigned to the same queue, as +shown in :numref:`figure_predictable_snat_2`: + +.. _figure_predictable_snat_2: + +.. figure:: img/predictable_snat_2.* + + Bidirectional flow packets distribution with predictable RSS + + +To achieve this in a SNAT scenario it is possible to choose a source port not +randomly, but using the predictable RSS library to produce a partial hash +collision. This is shown in the code below. + +.. code-block:: c + + int key_len = 40; /* The default Niantic RSS key length. */ + + /** The default Niantic RSS reta size = 2^7 entries, LSBs of hash value are + * used as an indexes in RSS ReTa. */ + int reta_sz = 7; + int ret; + struct rte_thash_ctx *ctx; + + uint8_t initial_key[key_len] = {0}; /* Default empty key. */ + + /* Create and initialize a new thash context. */ + ctx = rte_thash_init_ctx("SNAT", key_len, reta_sz, initial_key, 0); + + /** Add a helper and specify the variable tuple part and its length. In the + * SNAT case we want to choose a new source port on SNAT translation in a + * way that the reverse tuple will have the same LSBs as the original + * direction tuple so that the selected source port will be the + * destination port on reply. + */ + ret = rte_thash_add_helper(ctx, "snat", sizeof(uint16_t) * 8, + offsetof(union rte_thash_tuple, v4.dport) * 8); + + if (ret != 0) + return ret; + + /* Get handler of the required helper. */ + struct rte_thash_subtuple_helper *h = rte_thash_get_helper(ctx, "snat"); + + /** After calling rte_thash_add_helper() the initial_key passed on ctx + * creation has been changed so we get the new one. + */ + uint8_t *new_key = rte_thash_get_key(ctx); + + union rte_thash_tuple tuple, rev_tuple; + + /* A complete tuple from the packet. */ + complete_tuple(mbuf, &tuple); + + /* Calculate the RSS hash or get it from mbuf->hash.rss. */ + uint32_t orig_hash = rte_softrss((uint32_t *)&tuple, RTE_THASH_V4_L4_LEN, new_key); + + /** Complete the reverse tuple by translating the SRC address and swapping + * src and dst addresses and ports. + */ + get_rev_tuple(&rev_tuple, &tuple, new_ip); + + /* Calculate the expected rss hash for the reverse tuple. */ + uint32_t rev_hash = rte_softrss((uint32_t *)&rev_tuple, RTE_THASH_V4_L4_LEN, new_key); + + /* Get the adjustment bits for the src port to get a new port. */ + uint32_t adj = rte_thash_get_compliment(h, rev_hash, orig_hash); + + /* Adjust the source port bits. */ + uint16_t new_sport = tuple.v4.sport ^ adj; + + /* Make an actual packet translation. */ + do_snat(mbuf, new_ip, new_sport); diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst index 1c2e093..3b14822 100644 --- a/doc/guides/rel_notes/release_21_05.rst +++ b/doc/guides/rel_notes/release_21_05.rst @@ -203,6 +203,12 @@ New Features the events across multiple stages. * This also reduced the scheduling overhead on a event device. +* **Added Predictable RSS functionality to the Toeplitz hash library.** + + This feature provides functionality for finding collisions of the Toeplitz + hash function - the hash function used in NIC's to spread the traffic + among the queues. It can be used to get predictable mapping of the flows. + * **Updated testpmd.** * Added a command line option to configure forced speed for Ethernet port. -- 2.7.4