From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by dpdk.org (Postfix) with ESMTP id A7CACF616 for ; Fri, 16 Dec 2016 17:25:43 +0100 (CET) Received: by mail-wm0-f53.google.com with SMTP id t79so40755904wmt.0 for ; Fri, 16 Dec 2016 08:25:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=4tIfcAoD8BXE7goVxzxJS+yKwwIL0eiJs1R2TA3i/uo=; b=FVVZ/ifNBoVncVyb9WABVp8OXmgpPAG1KrjClLIIfA7K3oN6kteLEJjNEHwLGSlx1L JjodmAx4EYeRlCC3XFmoYwofYpkRUVTmA2BE8w9gusBltNjNo+QTll7cKyf8rlXgKxYD A7MLEWrtcioJpaQ6+A11nvH0/Ms3Vo3/cy+qjvIFYPQtZmWDgJoHW6yZYES6T2v8cQU+ HoIXUwic/N9vrq9ta0xxST9PI9o9pQGKJOKY/Vk5ajptqXIoAovBGo8Gyv2h6tQarL/J D1gY8Q9Mwe1C9USvNjbprphVPZEW3RtRJiv8la8euEJowFl7bJR1irV0Cm5YBo30SBf7 j84g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4tIfcAoD8BXE7goVxzxJS+yKwwIL0eiJs1R2TA3i/uo=; b=gWN2e4H9tJ+dhDExxAfSU5jMXPcXGfbn3fuiY1ae9o0nMfjs9n1++ekjcBkLfTKtSh Z7ZvZKtdyK+1nOPVUV4dtaCp5Nr4HL3Uk7wiZbgY2n/y1CPc47XHKMhNHO8sAN63OmX2 sRPB2xitUjlRXbxXxjNUW/0vViQHQrAS+DcI3cAgQ1C7YiuQrsLjJw8QrooYUisN23Af rrlPRRc7LND4R40iGeIFh8TiUWViUWILZCxW4FPmygVranyUYU0byh53QLkAxuaNbxT8 XLuebS3+E4uYb+IfTpPobw/nfd5nZlcu1/t7lcqcqZHTMZvBuq0PsLZxW+jg+ubTLRtW vwZA== X-Gm-Message-State: AIkVDXKUlpEd6ue5wTl6k5RVO3dGuqWBq2dYjuIXCX670jK/yTxcfOxdo3Shi80j+fzIXgoU X-Received: by 10.194.99.38 with SMTP id en6mr3551009wjb.184.1481905540582; Fri, 16 Dec 2016 08:25:40 -0800 (PST) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id k2sm7514163wjv.11.2016.12.16.08.25.35 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Fri, 16 Dec 2016 08:25:39 -0800 (PST) From: Adrien Mazarguil To: dev@dpdk.org Date: Fri, 16 Dec 2016 17:24:59 +0100 Message-Id: <049b57d5216d8703bc5f2cdd29eabe40c2f09138.1481903839.git.adrien.mazarguil@6wind.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v2 02/25] doc: add rte_flow prog guide X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2016 16:25:44 -0000 This documentation is based on the latest RFC submission, subsequently updated according to feedback from the community. Signed-off-by: Adrien Mazarguil --- doc/guides/prog_guide/index.rst | 1 + doc/guides/prog_guide/rte_flow.rst | 1853 +++++++++++++++++++++++++++++++ 2 files changed, 1854 insertions(+) diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst index e5a50a8..ed7f770 100644 --- a/doc/guides/prog_guide/index.rst +++ b/doc/guides/prog_guide/index.rst @@ -42,6 +42,7 @@ Programmer's Guide mempool_lib mbuf_lib poll_mode_drv + rte_flow cryptodev_lib link_bonding_poll_mode_drv_lib timer_lib diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst new file mode 100644 index 0000000..63413d1 --- /dev/null +++ b/doc/guides/prog_guide/rte_flow.rst @@ -0,0 +1,1853 @@ +.. BSD LICENSE + Copyright 2016 6WIND S.A. + Copyright 2016 Mellanox. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +.. _Generic_flow_API: + +Generic flow API (rte_flow) +=========================== + +Overview +-------- + +This API provides a generic means to configure hardware to match specific +ingress or egress traffic, alter its fate and query related counters +according to any number of user-defined rules. + +It is named *rte_flow* after the prefix used for all its symbols, and is +defined in ``rte_flow.h``. + +- Matching can be performed on packet data (protocol headers, payload) and + properties (e.g. associated physical port, virtual device function ID). + +- Possible operations include dropping traffic, diverting it to specific + queues, to virtual/physical device functions or ports, performing tunnel + offloads, adding marks and so on. + +It is slightly higher-level than the legacy filtering framework which it +encompasses and supersedes (including all functions and filter types) in +order to expose a single interface with an unambiguous behavior that is +common to all poll-mode drivers (PMDs). + +Several methods to migrate existing applications are described in `API +migration`_. + +Flow rule +--------- + +Description +~~~~~~~~~~~ + +A flow rule is the combination of attributes with a matching pattern and a +list of actions. Flow rules form the basis of this API. + +Flow rules can have several distinct actions (such as counting, +encapsulating, decapsulating before redirecting packets to a particular +queue, etc.), instead of relying on several rules to achieve this and having +applications deal with hardware implementation details regarding their +order. + +Support for different priority levels on a rule basis is provided, for +example in order to force a more specific rule to come before a more generic +one for packets matched by both. However hardware support for more than a +single priority level cannot be guaranteed. When supported, the number of +available priority levels is usually low, which is why they can also be +implemented in software by PMDs (e.g. missing priority levels may be +emulated by reordering rules). + +In order to remain as hardware-agnostic as possible, by default all rules +are considered to have the same priority, which means that the order between +overlapping rules (when a packet is matched by several filters) is +undefined. + +PMDs may refuse to create overlapping rules at a given priority level when +they can be detected (e.g. if a pattern matches an existing filter). + +Thus predictable results for a given priority level can only be achieved +with non-overlapping rules, using perfect matching on all protocol layers. + +Flow rules can also be grouped, the flow rule priority is specific to the +group they belong to. All flow rules in a given group are thus processed +either before or after another group. + +Support for multiple actions per rule may be implemented internally on top +of non-default hardware priorities, as a result both features may not be +simultaneously available to applications. + +Considering that allowed pattern/actions combinations cannot be known in +advance and would result in an unpractically large number of capabilities to +expose, a method is provided to validate a given rule from the current +device configuration state. + +This enables applications to check if the rule types they need is supported +at initialization time, before starting their data path. This method can be +used anytime, its only requirement being that the resources needed by a rule +should exist (e.g. a target RX queue should be configured first). + +Each defined rule is associated with an opaque handle managed by the PMD, +applications are responsible for keeping it. These can be used for queries +and rules management, such as retrieving counters or other data and +destroying them. + +To avoid resource leaks on the PMD side, handles must be explicitly +destroyed by the application before releasing associated resources such as +queues and ports. + +The following sections cover: + +- **Attributes** (represented by ``struct rte_flow_attr``): properties of a + flow rule such as its direction (ingress or egress) and priority. + +- **Pattern item** (represented by ``struct rte_flow_item``): part of a + matching pattern that either matches specific packet data or traffic + properties. It can also describe properties of the pattern itself, such as + inverted matching. + +- **Matching pattern**: traffic properties to look for, a combination of any + number of items. + +- **Actions** (represented by ``struct rte_flow_action``): operations to + perform whenever a packet is matched by a pattern. + +Attributes +~~~~~~~~~~ + +Group +^^^^^ + +Flow rules can be grouped by assigning them a common group number. Lower +values have higher priority. Group 0 has the highest priority. + +Although optional, applications are encouraged to group similar rules as +much as possible to fully take advantage of hardware capabilities +(e.g. optimized matching) and work around limitations (e.g. a single pattern +type possibly allowed in a given group). + +Note that support for more than a single group is not guaranteed. + +Priority +^^^^^^^^ + +A priority level can be assigned to a flow rule. Like groups, lower values +denote higher priority, with 0 as the maximum. + +A rule with priority 0 in group 8 is always matched after a rule with +priority 8 in group 0. + +Group and priority levels are arbitrary and up to the application, they do +not need to be contiguous nor start from 0, however the maximum number +varies between devices and may be affected by existing flow rules. + +If a packet is matched by several rules of a given group for a given +priority level, the outcome is undefined. It can take any path, may be +duplicated or even cause unrecoverable errors. + +Note that support for more than a single priority level is not guaranteed. + +Traffic direction +^^^^^^^^^^^^^^^^^ + +Flow rules can apply to inbound and/or outbound traffic (ingress/egress). + +Several pattern items and actions are valid and can be used in both +directions. At least one direction must be specified. + +Specifying both directions at once for a given rule is not recommended but +may be valid in a few cases (e.g. shared counters). + +Pattern item +~~~~~~~~~~~~ + +Pattern items fall in two categories: + +- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4, + IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a + specification structure. + +- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF, + VF, PORT and so on), often without a specification structure. + +Item specification structures are used to match specific values among +protocol fields (or item properties). Documentation describes for each item +whether they are associated with one and their type name if so. + +Up to three structures of the same type can be set for a given item: + +- ``spec``: values to match (e.g. a given IPv4 address). + +- ``last``: upper bound for an inclusive range with corresponding fields in + ``spec``. + +- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is + to distinguish the values to take into account and/or partially mask them + out (e.g. in order to match an IPv4 address prefix). + +Usage restrictions and expected behavior: + +- Setting either ``mask`` or ``last`` without ``spec`` is an error. + +- Field values in ``last`` which are either 0 or equal to the corresponding + values in ``spec`` are ignored; they do not generate a range. Nonzero + values lower than those in ``spec`` are not supported. + +- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD + to only take the fields it can recognize into account. There is no error + checking for unsupported fields. + +- Not setting any of them (assuming item type allows it) uses default + parameters that depend on the item type. Most of the time, particularly + for protocol header items, it is equivalent to providing an empty (zeroed) + ``mask``. + +- ``mask`` is a simple bit-mask applied before interpreting the contents of + ``spec`` and ``last``, which may yield unexpected results if not used + carefully. For example, if for an IPv4 address field, ``spec`` provides + *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides + *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*. + +Example of an item specification matching an Ethernet header: + ++------------------------------------------+ +| Ethernet | ++==========+==========+====================+ +| ``spec`` | ``src`` | ``00:01:02:03:04`` | +| +----------+--------------------+ +| | ``dst`` | ``00:2a:66:00:01`` | +| +----------+--------------------+ +| | ``type`` | ``0x22aa`` | ++----------+----------+--------------------+ +| ``last`` | unspecified | ++----------+----------+--------------------+ +| ``mask`` | ``src`` | ``00:ff:ff:ff:00`` | +| +----------+--------------------+ +| | ``dst`` | ``00:00:00:00:ff`` | +| +----------+--------------------+ +| | ``type`` | ``0x0000`` | ++----------+----------+--------------------+ + +Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers +with the following properties are thus matched: + +- ``src``: ``??:01:02:03:??`` +- ``dst``: ``??:??:??:??:01`` +- ``type``: ``0x????`` + +Matching pattern +~~~~~~~~~~~~~~~~ + +A pattern is formed by stacking items starting from the lowest protocol +layer to match. This stacking restriction does not apply to meta items which +can be placed anywhere in the stack without affecting the meaning of the +resulting pattern. + +Patterns are terminated by END items. + +Examples: + ++--------------+ +| TCPv4 as L4 | ++===+==========+ +| 0 | Ethernet | ++---+----------+ +| 1 | IPv4 | ++---+----------+ +| 2 | TCP | ++---+----------+ +| 3 | END | ++---+----------+ + +| + ++----------------+ +| TCPv6 in VXLAN | ++===+============+ +| 0 | Ethernet | ++---+------------+ +| 1 | IPv4 | ++---+------------+ +| 2 | UDP | ++---+------------+ +| 3 | VXLAN | ++---+------------+ +| 4 | Ethernet | ++---+------------+ +| 5 | IPv6 | ++---+------------+ +| 6 | TCP | ++---+------------+ +| 7 | END | ++---+------------+ + +| + ++-----------------------------+ +| TCPv4 as L4 with meta items | ++===+=========================+ +| 0 | VOID | ++---+-------------------------+ +| 1 | Ethernet | ++---+-------------------------+ +| 2 | VOID | ++---+-------------------------+ +| 3 | IPv4 | ++---+-------------------------+ +| 4 | TCP | ++---+-------------------------+ +| 5 | VOID | ++---+-------------------------+ +| 6 | VOID | ++---+-------------------------+ +| 7 | END | ++---+-------------------------+ + +The above example shows how meta items do not affect packet data matching +items, as long as those remain stacked properly. The resulting matching +pattern is identical to "TCPv4 as L4". + ++----------------+ +| UDPv6 anywhere | ++===+============+ +| 0 | IPv6 | ++---+------------+ +| 1 | UDP | ++---+------------+ +| 2 | END | ++---+------------+ + +If supported by the PMD, omitting one or several protocol layers at the +bottom of the stack as in the above example (missing an Ethernet +specification) enables looking up anywhere in packets. + +It is unspecified whether the payload of supported encapsulations +(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner, +outer or both packets. + ++---------------------+ +| Invalid, missing L3 | ++===+=================+ +| 0 | Ethernet | ++---+-----------------+ +| 1 | UDP | ++---+-----------------+ +| 2 | END | ++---+-----------------+ + +The above pattern is invalid due to a missing L3 specification between L2 +(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the +top of the stack. + +Meta item types +~~~~~~~~~~~~~~~ + +They match meta-data or affect pattern processing instead of matching packet +data directly, most of them do not need a specification structure. This +particularity allows them to be specified anywhere in the stack without +causing any side effect. + +``END`` +^^^^^^^ + +End marker for item lists. Prevents further processing of items, thereby +ending the pattern. + +- Its numeric value is 0 for convenience. +- PMD support is mandatory. +- ``spec``, ``last`` and ``mask`` are ignored. + ++--------------------+ +| END | ++==========+=========+ +| ``spec`` | ignored | ++----------+---------+ +| ``last`` | ignored | ++----------+---------+ +| ``mask`` | ignored | ++----------+---------+ + +``VOID`` +^^^^^^^^ + +Used as a placeholder for convenience. It is ignored and simply discarded by +PMDs. + +- PMD support is mandatory. +- ``spec``, ``last`` and ``mask`` are ignored. + ++--------------------+ +| VOID | ++==========+=========+ +| ``spec`` | ignored | ++----------+---------+ +| ``last`` | ignored | ++----------+---------+ +| ``mask`` | ignored | ++----------+---------+ + +One usage example for this type is generating rules that share a common +prefix quickly without reallocating memory, only by updating item types: + ++------------------------+ +| TCP, UDP or ICMP as L4 | ++===+====================+ +| 0 | Ethernet | ++---+--------------------+ +| 1 | IPv4 | ++---+------+------+------+ +| 2 | UDP | VOID | VOID | ++---+------+------+------+ +| 3 | VOID | TCP | VOID | ++---+------+------+------+ +| 4 | VOID | VOID | ICMP | ++---+------+------+------+ +| 5 | END | ++---+--------------------+ + +``INVERT`` +^^^^^^^^^^ + +Inverted matching, i.e. process packets that do not match the pattern. + +- ``spec``, ``last`` and ``mask`` are ignored. + ++--------------------+ +| INVERT | ++==========+=========+ +| ``spec`` | ignored | ++----------+---------+ +| ``last`` | ignored | ++----------+---------+ +| ``mask`` | ignored | ++----------+---------+ + +Usage example, matching non-TCPv4 packets only: + ++--------------------+ +| Anything but TCPv4 | ++===+================+ +| 0 | INVERT | ++---+----------------+ +| 1 | Ethernet | ++---+----------------+ +| 2 | IPv4 | ++---+----------------+ +| 3 | TCP | ++---+----------------+ +| 4 | END | ++---+----------------+ + +``PF`` +^^^^^^ + +Matches packets addressed to the physical function of the device. + +If the underlying device function differs from the one that would normally +receive the matched traffic, specifying this item prevents it from reaching +that device unless the flow rule contains a `PF (action)`_. Packets are not +duplicated between device instances by default. + +- Likely to return an error or never match any traffic if applied to a VF + device. +- Can be combined with any number of `VF`_ items to match both PF and VF + traffic. +- ``spec``, ``last`` and ``mask`` must not be set. + ++------------------+ +| PF | ++==========+=======+ +| ``spec`` | unset | ++----------+-------+ +| ``last`` | unset | ++----------+-------+ +| ``mask`` | unset | ++----------+-------+ + +``VF`` +^^^^^^ + +Matches packets addressed to a virtual function ID of the device. + +If the underlying device function differs from the one that would normally +receive the matched traffic, specifying this item prevents it from reaching +that device unless the flow rule contains a `VF (action)`_. Packets are not +duplicated between device instances by default. + +- Likely to return an error or never match any traffic if this causes a VF + device to match traffic addressed to a different VF. +- Can be specified multiple times to match traffic addressed to several VF + IDs. +- Can be combined with a PF item to match both PF and VF traffic. + ++------------------------------------------------+ +| VF | ++==========+=========+===========================+ +| ``spec`` | ``id`` | destination VF ID | ++----------+---------+---------------------------+ +| ``last`` | ``id`` | upper range value | ++----------+---------+---------------------------+ +| ``mask`` | ``id`` | zeroed to match any VF ID | ++----------+---------+---------------------------+ + +``PORT`` +^^^^^^^^ + +Matches packets coming from the specified physical port of the underlying +device. + +The first PORT item overrides the physical port normally associated with the +specified DPDK input port (port_id). This item can be provided several times +to match additional physical ports. + +Note that physical ports are not necessarily tied to DPDK input ports +(port_id) when those are not under DPDK control. Possible values are +specific to each device, they are not necessarily indexed from zero and may +not be contiguous. + +As a device property, the list of allowed values as well as the value +associated with a port_id should be retrieved by other means. + ++-------------------------------------------------------+ +| PORT | ++==========+===========+================================+ +| ``spec`` | ``index`` | physical port index | ++----------+-----------+--------------------------------+ +| ``last`` | ``index`` | upper range value | ++----------+-----------+--------------------------------+ +| ``mask`` | ``index`` | zeroed to match any port index | ++----------+-----------+--------------------------------+ + +Data matching item types +~~~~~~~~~~~~~~~~~~~~~~~~ + +Most of these are basically protocol header definitions with associated +bit-masks. They must be specified (stacked) from lowest to highest protocol +layer to form a matching pattern. + +The following list is not exhaustive, new protocols will be added in the +future. + +``ANY`` +^^^^^^^ + +Matches any protocol in place of the current layer, a single ANY may also +stand for several protocol layers. + +This is usually specified as the first pattern item when looking for a +protocol anywhere in a packet. + ++-----------------------------------------------------------+ +| ANY | ++==========+=========+======================================+ +| ``spec`` | ``num`` | number of layers covered | ++----------+---------+--------------------------------------+ +| ``last`` | ``num`` | upper range value | ++----------+---------+--------------------------------------+ +| ``mask`` | ``num`` | zeroed to cover any number of layers | ++----------+---------+--------------------------------------+ + +Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6) +and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4 +or IPv6) matched by the second ANY specification: + ++----------------------------------+ +| TCP in VXLAN with wildcards | ++===+==============================+ +| 0 | Ethernet | ++---+-----+----------+---------+---+ +| 1 | ANY | ``spec`` | ``num`` | 2 | ++---+-----+----------+---------+---+ +| 2 | VXLAN | ++---+------------------------------+ +| 3 | Ethernet | ++---+-----+----------+---------+---+ +| 4 | ANY | ``spec`` | ``num`` | 1 | ++---+-----+----------+---------+---+ +| 5 | TCP | ++---+------------------------------+ +| 6 | END | ++---+------------------------------+ + +``RAW`` +^^^^^^^ + +Matches a byte string of a given length at a given offset. + +Offset is either absolute (using the start of the packet) or relative to the +end of the previous matched item in the stack, in which case negative values +are allowed. + +If search is enabled, offset is used as the starting point. The search area +can be delimited by setting limit to a nonzero value, which is the maximum +number of bytes after offset where the pattern may start. + +Matching a zero-length pattern is allowed, doing so resets the relative +offset for subsequent items. + +- This type does not support ranges (``last`` field). + ++---------------------------------------------------------------------------+ +| RAW | ++==========+==============+=================================================+ +| ``spec`` | ``relative`` | look for pattern after the previous item | +| +--------------+-------------------------------------------------+ +| | ``search`` | search pattern from offset (see also ``limit``) | +| +--------------+-------------------------------------------------+ +| | ``reserved`` | reserved, must be set to zero | +| +--------------+-------------------------------------------------+ +| | ``offset`` | absolute or relative offset for ``pattern`` | +| +--------------+-------------------------------------------------+ +| | ``limit`` | search area limit for start of ``pattern`` | +| +--------------+-------------------------------------------------+ +| | ``length`` | ``pattern`` length | +| +--------------+-------------------------------------------------+ +| | ``pattern`` | byte string to look for | ++----------+--------------+-------------------------------------------------+ +| ``last`` | if specified, either all 0 or with the same values as ``spec`` | ++----------+----------------------------------------------------------------+ +| ``mask`` | bit-mask applied to ``spec`` values with usual behavior | ++----------+----------------------------------------------------------------+ + +Example pattern looking for several strings at various offsets of a UDP +payload, using combined RAW items: + ++-------------------------------------------+ +| UDP payload matching | ++===+=======================================+ +| 0 | Ethernet | ++---+---------------------------------------+ +| 1 | IPv4 | ++---+---------------------------------------+ +| 2 | UDP | ++---+-----+----------+--------------+-------+ +| 3 | RAW | ``spec`` | ``relative`` | 1 | +| | | +--------------+-------+ +| | | | ``search`` | 1 | +| | | +--------------+-------+ +| | | | ``offset`` | 10 | +| | | +--------------+-------+ +| | | | ``limit`` | 0 | +| | | +--------------+-------+ +| | | | ``length`` | 3 | +| | | +--------------+-------+ +| | | | ``pattern`` | "foo" | ++---+-----+----------+--------------+-------+ +| 4 | RAW | ``spec`` | ``relative`` | 1 | +| | | +--------------+-------+ +| | | | ``search`` | 0 | +| | | +--------------+-------+ +| | | | ``offset`` | 20 | +| | | +--------------+-------+ +| | | | ``limit`` | 0 | +| | | +--------------+-------+ +| | | | ``length`` | 3 | +| | | +--------------+-------+ +| | | | ``pattern`` | "bar" | ++---+-----+----------+--------------+-------+ +| 5 | RAW | ``spec`` | ``relative`` | 1 | +| | | +--------------+-------+ +| | | | ``search`` | 0 | +| | | +--------------+-------+ +| | | | ``offset`` | -29 | +| | | +--------------+-------+ +| | | | ``limit`` | 0 | +| | | +--------------+-------+ +| | | | ``length`` | 3 | +| | | +--------------+-------+ +| | | | ``pattern`` | "baz" | ++---+-----+----------+--------------+-------+ +| 6 | END | ++---+---------------------------------------+ + +This translates to: + +- Locate "foo" at least 10 bytes deep inside UDP payload. +- Locate "bar" after "foo" plus 20 bytes. +- Locate "baz" after "bar" minus 29 bytes. + +Such a packet may be represented as follows (not to scale):: + + 0 >= 10 B == 20 B + | |<--------->| |<--------->| + | | | | | + |-----|------|-----|-----|-----|-----|-----------|-----|------| + | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... | + |-----|------|-----|-----|-----|-----|-----------|-----|------| + | | + |<--------------------------->| + == 29 B + +Note that matching subsequent pattern items would resume after "baz", not +"bar" since matching is always performed after the previous item of the +stack. + +``ETH`` +^^^^^^^ + +Matches an Ethernet header. + +- ``dst``: destination MAC. +- ``src``: source MAC. +- ``type``: EtherType. + +``VLAN`` +^^^^^^^^ + +Matches an 802.1Q/ad VLAN tag. + +- ``tpid``: tag protocol identifier. +- ``tci``: tag control information. + +``IPV4`` +^^^^^^^^ + +Matches an IPv4 header. + +Note: IPv4 options are handled by dedicated pattern items. + +- ``hdr``: IPv4 header definition (``rte_ip.h``). + +``IPV6`` +^^^^^^^^ + +Matches an IPv6 header. + +Note: IPv6 options are handled by dedicated pattern items. + +- ``hdr``: IPv6 header definition (``rte_ip.h``). + +``ICMP`` +^^^^^^^^ + +Matches an ICMP header. + +- ``hdr``: ICMP header definition (``rte_icmp.h``). + +``UDP`` +^^^^^^^ + +Matches a UDP header. + +- ``hdr``: UDP header definition (``rte_udp.h``). + +``TCP`` +^^^^^^^ + +Matches a TCP header. + +- ``hdr``: TCP header definition (``rte_tcp.h``). + +``SCTP`` +^^^^^^^^ + +Matches a SCTP header. + +- ``hdr``: SCTP header definition (``rte_sctp.h``). + +``VXLAN`` +^^^^^^^^^ + +Matches a VXLAN header (RFC 7348). + +- ``flags``: normally 0x08 (I flag). +- ``rsvd0``: reserved, normally 0x000000. +- ``vni``: VXLAN network identifier. +- ``rsvd1``: reserved, normally 0x00. + +Actions +~~~~~~~ + +Each possible action is represented by a type. Some have associated +configuration structures. Several actions combined in a list can be affected +to a flow rule. That list is not ordered. + +They fall in three categories: + +- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent + processing matched packets by subsequent flow rules, unless overridden + with PASSTHRU. + +- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for + additional processing by subsequent flow rules. + +- Other non-terminating meta actions that do not affect the fate of packets + (END, VOID, MARK, FLAG, COUNT). + +When several actions are combined in a flow rule, they should all have +different types (e.g. dropping a packet twice is not possible). + +Only the last action of a given type is taken into account. PMDs still +perform error checking on the entire list. + +Like matching patterns, action lists are terminated by END items. + +*Note that PASSTHRU is the only action able to override a terminating rule.* + +Example of action that redirects packets to queue index 10: + ++----------------+ +| QUEUE | ++===========+====+ +| ``index`` | 10 | ++-----------+----+ + +Action lists examples, their order is not significant, applications must +consider all actions to be performed simultaneously: + ++----------------+ +| Count and drop | ++================+ +| COUNT | ++----------------+ +| DROP | ++----------------+ +| END | ++----------------+ + +| + ++--------------------------+ +| Mark, count and redirect | ++=======+===========+======+ +| MARK | ``mark`` | 0x2a | ++-------+-----------+------+ +| COUNT | ++-------+-----------+------+ +| QUEUE | ``queue`` | 10 | ++-------+-----------+------+ +| END | ++--------------------------+ + +| + ++-----------------------+ +| Redirect to queue 5 | ++=======================+ +| DROP | ++-------+-----------+---+ +| QUEUE | ``queue`` | 5 | ++-------+-----------+---+ +| END | ++-----------------------+ + +In the above example, considering both actions are performed simultaneously, +the end result is that only QUEUE has any effect. + ++-----------------------+ +| Redirect to queue 3 | ++=======+===========+===+ +| QUEUE | ``queue`` | 5 | ++-------+-----------+---+ +| VOID | ++-------+-----------+---+ +| QUEUE | ``queue`` | 3 | ++-------+-----------+---+ +| END | ++-----------------------+ + +As previously described, only the last action of a given type found in the +list is taken into account. The above example also shows that VOID is +ignored. + +Action types +~~~~~~~~~~~~ + +Common action types are described in this section. Like pattern item types, +this list is not exhaustive as new actions will be added in the future. + +``END`` (action) +^^^^^^^^^^^^^^^^ + +End marker for action lists. Prevents further processing of actions, thereby +ending the list. + +- Its numeric value is 0 for convenience. +- PMD support is mandatory. +- No configurable properties. + ++---------------+ +| END | ++===============+ +| no properties | ++---------------+ + +``VOID`` (action) +^^^^^^^^^^^^^^^^^ + +Used as a placeholder for convenience. It is ignored and simply discarded by +PMDs. + +- PMD support is mandatory. +- No configurable properties. + ++---------------+ +| VOID | ++===============+ +| no properties | ++---------------+ + +``PASSTHRU`` +^^^^^^^^^^^^ + +Leaves packets up for additional processing by subsequent flow rules. This +is the default when a rule does not contain a terminating action, but can be +specified to force a rule to become non-terminating. + +- No configurable properties. + ++---------------+ +| PASSTHRU | ++===============+ +| no properties | ++---------------+ + +Example to copy a packet to a queue and continue processing by subsequent +flow rules: + ++--------------------------+ +| Copy to queue 8 | ++==========================+ +| PASSTHRU | ++----------+-----------+---+ +| QUEUE | ``queue`` | 8 | ++----------+-----------+---+ +| END | ++--------------------------+ + +``MARK`` +^^^^^^^^ + +Attaches a 32 bit value to packets. + +This value is arbitrary and application-defined. For compatibility with FDIR +it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is +also set in ``ol_flags``. + ++----------------------------------------------+ +| MARK | ++========+=====================================+ +| ``id`` | 32 bit value to return with packets | ++--------+-------------------------------------+ + +``FLAG`` +^^^^^^^^ + +Flag packets. Similar to `MARK`_ but only affects ``ol_flags``. + +- No configurable properties. + +Note: a distinctive flag must be defined for it. + ++---------------+ +| FLAG | ++===============+ +| no properties | ++---------------+ + +``QUEUE`` +^^^^^^^^^ + +Assigns packets to a given queue index. + +- Terminating by default. + ++--------------------------------+ +| QUEUE | ++===========+====================+ +| ``index`` | queue index to use | ++-----------+--------------------+ + +``DROP`` +^^^^^^^^ + +Drop packets. + +- No configurable properties. +- Terminating by default. +- PASSTHRU overrides this action if both are specified. + ++---------------+ +| DROP | ++===============+ +| no properties | ++---------------+ + +``COUNT`` +^^^^^^^^^ + +Enables counters for this rule. + +These counters can be retrieved and reset through ``rte_flow_query()``, see +``struct rte_flow_query_count``. + +- Counters can be retrieved with ``rte_flow_query()``. +- No configurable properties. + ++---------------+ +| COUNT | ++===============+ +| no properties | ++---------------+ + +Query structure to retrieve and reset flow rule counters: + ++---------------------------------------------------------+ +| COUNT query | ++===============+=====+===================================+ +| ``reset`` | in | reset counter after query | ++---------------+-----+-----------------------------------+ +| ``hits_set`` | out | ``hits`` field is set | ++---------------+-----+-----------------------------------+ +| ``bytes_set`` | out | ``bytes`` field is set | ++---------------+-----+-----------------------------------+ +| ``hits`` | out | number of hits for this rule | ++---------------+-----+-----------------------------------+ +| ``bytes`` | out | number of bytes through this rule | ++---------------+-----+-----------------------------------+ + +``DUP`` +^^^^^^^ + +Duplicates packets to a given queue index. + +This is normally combined with QUEUE, however when used alone, it is +actually similar to QUEUE + PASSTHRU. + +- Non-terminating by default. + ++------------------------------------------------+ +| DUP | ++===========+====================================+ +| ``index`` | queue index to duplicate packet to | ++-----------+------------------------------------+ + +``RSS`` +^^^^^^^ + +Similar to QUEUE, except RSS is additionally performed on packets to spread +them among several queues according to the provided parameters. + +Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field, +however it conflicts with the `MARK`_ action as they share the same +space. When both actions are specified, the RSS hash is discarded and +``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf +structure should eventually evolve to store both. + +- Terminating by default. + ++---------------------------------------------+ +| RSS | ++==============+==============================+ +| ``rss_conf`` | RSS parameters | ++--------------+------------------------------+ +| ``num`` | number of entries in queue[] | ++--------------+------------------------------+ +| ``queue[]`` | queue indices to use | ++--------------+------------------------------+ + +``PF`` (action) +^^^^^^^^^^^^^^^ + +Redirects packets to the physical function (PF) of the current device. + +- No configurable properties. +- Terminating by default. + ++---------------+ +| PF | ++===============+ +| no properties | ++---------------+ + +``VF`` (action) +^^^^^^^^^^^^^^^ + +Redirects packets to a virtual function (VF) of the current device. + +Packets matched by a VF pattern item can be redirected to their original VF +ID instead of the specified one. This parameter may not be available and is +not guaranteed to work properly if the VF part is matched by a prior flow +rule or if packets are not addressed to a VF in the first place. + +- Terminating by default. + ++-----------------------------------------------+ +| VF | ++==============+================================+ +| ``original`` | use original VF ID if possible | ++--------------+--------------------------------+ +| ``vf`` | VF ID to redirect packets to | ++--------------+--------------------------------+ + +Negative types +~~~~~~~~~~~~~~ + +All specified pattern items (``enum rte_flow_item_type``) and actions +(``enum rte_flow_action_type``) use positive identifiers. + +The negative space is reserved for dynamic types generated by PMDs during +run-time. PMDs may encounter them as a result but must not accept negative +identifiers they are not aware of. + +A method to generate them remains to be defined. + +Planned types +~~~~~~~~~~~~~ + +Pattern item types will be added as new protocols are implemented. + +Variable headers support through dedicated pattern items, for example in +order to match specific IPv4 options and IPv6 extension headers would be +stacked after IPv4/IPv6 items. + +Other action types are planned but are not defined yet. These include the +ability to alter packet data in several ways, such as performing +encapsulation/decapsulation of tunnel headers. + +Rules management +---------------- + +A rather simple API with few functions is provided to fully manage flow +rules. + +Each created flow rule is associated with an opaque, PMD-specific handle +pointer. The application is responsible for keeping it until the rule is +destroyed. + +Flows rules are represented by ``struct rte_flow`` objects. + +Validation +~~~~~~~~~~ + +Given that expressing a definite set of device capabilities is not +practical, a dedicated function is provided to check if a flow rule is +supported and can be created. + +:: + + int + rte_flow_validate(uint8_t port_id, + const struct rte_flow_attr *attr, + const struct rte_flow_item pattern[], + const struct rte_flow_action actions[], + struct rte_flow_error *error); + +While this function has no effect on the target device, the flow rule is +validated against its current configuration state and the returned value +should be considered valid by the caller for that state only. + +The returned value is guaranteed to remain valid only as long as no +successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made +in the meantime and no device parameter affecting flow rules in any way are +modified, due to possible collisions or resource limitations (although in +such cases ``EINVAL`` should not be returned). + +Arguments: + +- ``port_id``: port identifier of Ethernet device. +- ``attr``: flow rule attributes. +- ``pattern``: pattern specification (list terminated by the END pattern + item). +- ``actions``: associated actions (list terminated by the END action). +- ``error``: perform verbose error reporting if not NULL. + +Return values: + +- 0 if flow rule is valid and can be created. A negative errno value + otherwise (``rte_errno`` is also set), the following errors are defined. +- ``-ENOSYS``: underlying device does not support this functionality. +- ``-EINVAL``: unknown or invalid rule specification. +- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial + bit-masks are unsupported). +- ``-EEXIST``: collision with an existing rule. +- ``-ENOMEM``: not enough resources. +- ``-EBUSY``: action cannot be performed due to busy device resources, may + succeed if the affected queues or even the entire port are in a stopped + state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``). + +Creation +~~~~~~~~ + +Creating a flow rule is similar to validating one, except the rule is +actually created and a handle returned. + +:: + + struct rte_flow * + rte_flow_create(uint8_t port_id, + const struct rte_flow_attr *attr, + const struct rte_flow_item pattern[], + const struct rte_flow_action *actions[], + struct rte_flow_error *error); + +Arguments: + +- ``port_id``: port identifier of Ethernet device. +- ``attr``: flow rule attributes. +- ``pattern``: pattern specification (list terminated by the END pattern + item). +- ``actions``: associated actions (list terminated by the END action). +- ``error``: perform verbose error reporting if not NULL. + +Return values: + +A valid handle in case of success, NULL otherwise and ``rte_errno`` is set +to the positive version of one of the error codes defined for +``rte_flow_validate()``. + +Destruction +~~~~~~~~~~~ + +Flow rules destruction is not automatic, and a queue or a port should not be +released if any are still attached to them. Applications must take care of +performing this step before releasing resources. + +:: + + int + rte_flow_destroy(uint8_t port_id, + struct rte_flow *flow, + struct rte_flow_error *error); + + +Failure to destroy a flow rule handle may occur when other flow rules depend +on it, and destroying it would result in an inconsistent state. + +This function is only guaranteed to succeed if handles are destroyed in +reverse order of their creation. + +Arguments: + +- ``port_id``: port identifier of Ethernet device. +- ``flow``: flow rule handle to destroy. +- ``error``: perform verbose error reporting if not NULL. + +Return values: + +- 0 on success, a negative errno value otherwise and ``rte_errno`` is set. + +Flush +~~~~~ + +Convenience function to destroy all flow rule handles associated with a +port. They are released as with successive calls to ``rte_flow_destroy()``. + +:: + + int + rte_flow_flush(uint8_t port_id, + struct rte_flow_error *error); + +In the unlikely event of failure, handles are still considered destroyed and +no longer valid but the port must be assumed to be in an inconsistent state. + +Arguments: + +- ``port_id``: port identifier of Ethernet device. +- ``error``: perform verbose error reporting if not NULL. + +Return values: + +- 0 on success, a negative errno value otherwise and ``rte_errno`` is set. + +Query +~~~~~ + +Query an existing flow rule. + +This function allows retrieving flow-specific data such as counters. Data +is gathered by special actions which must be present in the flow rule +definition. + +:: + + int + rte_flow_query(uint8_t port_id, + struct rte_flow *flow, + enum rte_flow_action_type action, + void *data, + struct rte_flow_error *error); + +Arguments: + +- ``port_id``: port identifier of Ethernet device. +- ``flow``: flow rule handle to query. +- ``action``: action type to query. +- ``data``: pointer to storage for the associated query data type. +- ``error``: perform verbose error reporting if not NULL. + +Return values: + +- 0 on success, a negative errno value otherwise and ``rte_errno`` is set. + +Verbose error reporting +----------------------- + +The defined *errno* values may not be accurate enough for users or +application developers who want to investigate issues related to flow rules +management. A dedicated error object is defined for this purpose:: + + enum rte_flow_error_type { + RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */ + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */ + RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */ + RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */ + RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */ + RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */ + RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */ + RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */ + RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */ + RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */ + RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */ + RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */ + }; + + struct rte_flow_error { + enum rte_flow_error_type type; /**< Cause field and error types. */ + const void *cause; /**< Object responsible for the error. */ + const char *message; /**< Human-readable error message. */ + }; + +Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case +remaining fields can be ignored. Other error types describe the type of the +object pointed by ``cause``. + +If non-NULL, ``cause`` points to the object responsible for the error. For a +flow rule, this may be a pattern item or an individual action. + +If non-NULL, ``message`` provides a human-readable error message. + +This object is normally allocated by applications and set by PMDs, the +message points to a constant string which does not need to be freed by the +application, however its pointer can be considered valid only as long as its +associated DPDK port remains configured. Closing the underlying device or +unloading the PMD invalidates it. + +Caveats +------- + +- DPDK does not keep track of flow rules definitions or flow rule objects + automatically. Applications may keep track of the former and must keep + track of the latter. PMDs may also do it for internal needs, however this + must not be relied on by applications. + +- Flow rules are not maintained between successive port initializations. An + application exiting without releasing them and restarting must re-create + them from scratch. + +- API operations are synchronous and blocking (``EAGAIN`` cannot be + returned). + +- There is no provision for reentrancy/multi-thread safety, although nothing + should prevent different devices from being configured at the same + time. PMDs may protect their control path functions accordingly. + +- Stopping the data path (TX/RX) should not be necessary when managing flow + rules. If this cannot be achieved naturally or with workarounds (such as + temporarily replacing the burst function pointers), an appropriate error + code must be returned (``EBUSY``). + +- PMDs, not applications, are responsible for maintaining flow rules + configuration when stopping and restarting a port or performing other + actions which may affect them. They can only be destroyed explicitly by + applications. + +For devices exposing multiple ports sharing global settings affected by flow +rules: + +- All ports under DPDK control must behave consistently, PMDs are + responsible for making sure that existing flow rules on a port are not + affected by other ports. + +- Ports not under DPDK control (unaffected or handled by other applications) + are user's responsibility. They may affect existing flow rules and cause + undefined behavior. PMDs aware of this may prevent flow rules creation + altogether in such cases. + +PMD interface +------------- + +The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to +API/ABI versioning constraints as it is not exposed to applications and may +evolve independently. + +It is currently implemented on top of the legacy filtering framework through +filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation +*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped +inside ``struct rte_flow_ops``. + +This overhead is temporarily necessary in order to keep compatibility with +the legacy filtering framework, which should eventually disappear. + +- PMD callbacks implement exactly the interface described in `Rules + management`_, except for the port ID argument which has already been + converted to a pointer to the underlying ``struct rte_eth_dev``. + +- Public API functions do not process flow rules definitions at all before + calling PMD functions (no basic error checking, no validation + whatsoever). They only make sure these callbacks are non-NULL or return + the ``ENOSYS`` (function not supported) error. + +This interface additionally defines the following helper functions: + +- ``rte_flow_ops_get()``: get generic flow operations structure from a + port. + +- ``rte_flow_error_set()``: initialize generic flow error structure. + +More will be added over time. + +Device compatibility +-------------------- + +No known implementation supports all the described features. + +Unsupported features or combinations are not expected to be fully emulated +in software by PMDs for performance reasons. Partially supported features +may be completed in software as long as hardware performs most of the work +(such as queue redirection and packet recognition). + +However PMDs are expected to do their best to satisfy application requests +by working around hardware limitations as long as doing so does not affect +the behavior of existing flow rules. + +The following sections provide a few examples of such cases and describe how +PMDs should handle them, they are based on limitations built into the +previous APIs. + +Global bit-masks +~~~~~~~~~~~~~~~~ + +Each flow rule comes with its own, per-layer bit-masks, while hardware may +support only a single, device-wide bit-mask for a given layer type, so that +two IPv4 rules cannot use different bit-masks. + +The expected behavior in this case is that PMDs automatically configure +global bit-masks according to the needs of the first flow rule created. + +Subsequent rules are allowed only if their bit-masks match those, the +``EEXIST`` error code should be returned otherwise. + +Unsupported layer types +~~~~~~~~~~~~~~~~~~~~~~~ + +Many protocols can be simulated by crafting patterns with the `RAW`_ type. + +PMDs can rely on this capability to simulate support for protocols with +headers not directly recognized by hardware. + +``ANY`` pattern item +~~~~~~~~~~~~~~~~~~~~ + +This pattern item stands for anything, which can be difficult to translate +to something hardware would understand, particularly if followed by more +specific types. + +Consider the following pattern: + ++---+-------------------------+ +| 0 | ETHER | ++---+-------+---------+-------+ +| 1 | ANY | ``num`` | ``1`` | ++---+-------+---------+-------+ +| 2 | TCP | ++---+-------------------------+ +| 3 | END | ++---+-------------------------+ + +Knowing that TCP does not make sense with something other than IPv4 and IPv6 +as L3, such a pattern may be translated to two flow rules instead: + ++---+--------------------+ +| 0 | ETHER | ++---+--------------------+ +| 1 | IPV4 (zeroed mask) | ++---+--------------------+ +| 2 | TCP | ++---+--------------------+ +| 3 | END | ++---+--------------------+ + +.. + ++---+--------------------+ +| 0 | ETHER | ++---+--------------------+ +| 1 | IPV6 (zeroed mask) | ++---+--------------------+ +| 2 | TCP | ++---+--------------------+ +| 3 | END | ++---+--------------------+ + +Note that as soon as a ANY rule covers several layers, this approach may +yield a large number of hidden flow rules. It is thus suggested to only +support the most common scenarios (anything as L2 and/or L3). + +Unsupported actions +~~~~~~~~~~~~~~~~~~~ + +- When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and + tagging (`MARK`_ or `FLAG`_) may be implemented in software as long as the + target queue is used by a single rule. + +- A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden + rules combining `QUEUE`_ and `PASSTHRU`_. + +- When a single target queue is provided, `RSS`_ can also be implemented + through `QUEUE`_. + +Flow rules priority +~~~~~~~~~~~~~~~~~~~ + +While it would naturally make sense, flow rules cannot be assumed to be +processed by hardware in the same order as their creation for several +reasons: + +- They may be managed internally as a tree or a hash table instead of a + list. +- Removing a flow rule before adding another one can either put the new rule + at the end of the list or reuse a freed entry. +- Duplication may occur when packets are matched by several rules. + +For overlapping rules (particularly in order to use the `PASSTHRU`_ action) +predictable behavior is only guaranteed by using different priority levels. + +Priority levels are not necessarily implemented in hardware, or may be +severely limited (e.g. a single priority bit). + +For these reasons, priority levels may be implemented purely in software by +PMDs. + +- For devices expecting flow rules to be added in the correct order, PMDs + may destroy and re-create existing rules after adding a new one with + a higher priority. + +- A configurable number of dummy or empty rules can be created at + initialization time to save high priority slots for later. + +- In order to save priority levels, PMDs may evaluate whether rules are + likely to collide and adjust their priority accordingly. + +Future evolutions +----------------- + +- A device profile selection function which could be used to force a + permanent profile instead of relying on its automatic configuration based + on existing flow rules. + +- A method to optimize *rte_flow* rules with specific pattern items and + action types generated on the fly by PMDs. DPDK should assign negative + numbers to these in order to not collide with the existing types. See + `Negative types`_. + +- Adding specific egress pattern items and actions as described in `Traffic + direction`_. + +- Optional software fallback when PMDs are unable to handle requested flow + rules so applications do not have to implement their own. + +API migration +------------- + +Exhaustive list of deprecated filter types (normally prefixed with +*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them +to *rte_flow* rules. + +``MACVLAN`` to ``ETH`` → ``VF``, ``PF`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*MACVLAN* can be translated to a basic `ETH`_ flow rule with a `VF +(action)`_ or `PF (action)`_ terminating action. + ++------------------------------------+ +| MACVLAN | ++--------------------------+---------+ +| Pattern | Actions | ++===+=====+==========+=====+=========+ +| 0 | ETH | ``spec`` | any | VF, | +| | +----------+-----+ PF | +| | | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+-----+----------+-----+---------+ +| 1 | END | END | ++---+----------------------+---------+ + +``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*ETHERTYPE* is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as a +terminating action. + ++------------------------------------+ +| ETHERTYPE | ++--------------------------+---------+ +| Pattern | Actions | ++===+=====+==========+=====+=========+ +| 0 | ETH | ``spec`` | any | QUEUE, | +| | +----------+-----+ DROP | +| | | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+-----+----------+-----+---------+ +| 1 | END | END | ++---+----------------------+---------+ + +``FLEXIBLE`` to ``RAW`` → ``QUEUE`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*FLEXIBLE* can be translated to one `RAW`_ pattern with `QUEUE`_ as the +terminating action and a defined priority level. + ++------------------------------------+ +| FLEXIBLE | ++--------------------------+---------+ +| Pattern | Actions | ++===+=====+==========+=====+=========+ +| 0 | RAW | ``spec`` | any | QUEUE | +| | +----------+-----+ | +| | | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+-----+----------+-----+---------+ +| 1 | END | END | ++---+----------------------+---------+ + +``SYN`` to ``TCP`` → ``QUEUE`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*SYN* is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and +`QUEUE`_ as the terminating action. + +Priority level can be set to simulate the high priority bit. + ++---------------------------------------------+ +| SYN | ++-----------------------------------+---------+ +| Pattern | Actions | ++===+======+==========+=============+=========+ +| 0 | ETH | ``spec`` | unset | QUEUE | +| | +----------+-------------+ | +| | | ``last`` | unset | | +| | +----------+-------------+ | +| | | ``mask`` | unset | | ++---+------+----------+-------------+ | +| 1 | IPV4 | ``spec`` | unset | | +| | +----------+-------------+ | +| | | ``mask`` | unset | | +| | +----------+-------------+ | +| | | ``mask`` | unset | | ++---+------+----------+---------+---+ | +| 2 | TCP | ``spec`` | ``syn`` | 1 | | +| | +----------+---------+---+ | +| | | ``mask`` | ``syn`` | 1 | | ++---+------+----------+---------+---+---------+ +| 3 | END | END | ++---+-------------------------------+---------+ + +``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*NTUPLE* is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or +`UDP`_ as L4 and `QUEUE`_ as the terminating action. + +A priority level can be specified as well. + ++---------------------------------------+ +| NTUPLE | ++-----------------------------+---------+ +| Pattern | Actions | ++===+======+==========+=======+=========+ +| 0 | ETH | ``spec`` | unset | QUEUE | +| | +----------+-------+ | +| | | ``last`` | unset | | +| | +----------+-------+ | +| | | ``mask`` | unset | | ++---+------+----------+-------+ | +| 1 | IPV4 | ``spec`` | any | | +| | +----------+-------+ | +| | | ``last`` | unset | | +| | +----------+-------+ | +| | | ``mask`` | any | | ++---+------+----------+-------+ | +| 2 | TCP, | ``spec`` | any | | +| | UDP +----------+-------+ | +| | | ``last`` | unset | | +| | +----------+-------+ | +| | | ``mask`` | any | | ++---+------+----------+-------+---------+ +| 3 | END | END | ++---+-------------------------+---------+ + +``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types. + +In the following table, `ANY`_ is used to cover the optional L4. + ++------------------------------------------------+ +| TUNNEL | ++--------------------------------------+---------+ +| Pattern | Actions | ++===+=========+==========+=============+=========+ +| 0 | ETH | ``spec`` | any | QUEUE | +| | +----------+-------------+ | +| | | ``last`` | unset | | +| | +----------+-------------+ | +| | | ``mask`` | any | | ++---+---------+----------+-------------+ | +| 1 | IPV4, | ``spec`` | any | | +| | IPV6 +----------+-------------+ | +| | | ``last`` | unset | | +| | +----------+-------------+ | +| | | ``mask`` | any | | ++---+---------+----------+-------------+ | +| 2 | ANY | ``spec`` | any | | +| | +----------+-------------+ | +| | | ``last`` | unset | | +| | +----------+---------+---+ | +| | | ``mask`` | ``num`` | 0 | | ++---+---------+----------+---------+---+ | +| 3 | VXLAN, | ``spec`` | any | | +| | GENEVE, +----------+-------------+ | +| | TEREDO, | ``last`` | unset | | +| | NVGRE, +----------+-------------+ | +| | GRE, | ``mask`` | any | | +| | ... | | | | +| | | | | | +| | | | | | ++---+---------+----------+-------------+---------+ +| 4 | END | END | ++---+----------------------------------+---------+ + +``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*FDIR* is more complex than any other type, there are several methods to +emulate its functionality. It is summarized for the most part in the table +below. + +A few features are intentionally not supported: + +- The ability to configure the matching input set and masks for the entire + device, PMDs should take care of it automatically according to the + requested flow rules. + + For example if a device supports only one bit-mask per protocol type, + source/address IPv4 bit-masks can be made immutable by the first created + rule. Subsequent IPv4 or TCPv4 rules can only be created if they are + compatible. + + Note that only protocol bit-masks affected by existing flow rules are + immutable, others can be changed later. They become mutable again after + the related flow rules are destroyed. + +- Returning four or eight bytes of matched data when using flex bytes + filtering. Although a specific action could implement it, it conflicts + with the much more useful 32 bits tagging on devices that support it. + +- Side effects on RSS processing of the entire device. Flow rules that + conflict with the current device configuration should not be + allowed. Similarly, device configuration should not be allowed when it + affects existing flow rules. + +- Device modes of operation. "none" is unsupported since filtering cannot be + disabled as long as a flow rule is present. + +- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set + according to the created flow rules. + +- Signature mode of operation is not defined but could be handled through a + specific item type if needed. + ++----------------------------------------------+ +| FDIR | ++---------------------------------+------------+ +| Pattern | Actions | ++===+============+==========+=====+============+ +| 0 | ETH, | ``spec`` | any | QUEUE, | +| | RAW +----------+-----+ DROP, | +| | | ``last`` | N/A | PASSTHRU | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+------------+----------+-----+------------+ +| 1 | IPV4, | ``spec`` | any | MARK | +| | IPV6 +----------+-----+ | +| | | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+------------+----------+-----+ | +| 2 | TCP, | ``spec`` | any | | +| | UDP, +----------+-----+ | +| | SCTP | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+------------+----------+-----+ | +| 3 | VF, | ``spec`` | any | | +| | PF +----------+-----+ | +| | (optional) | ``last`` | N/A | | +| | +----------+-----+ | +| | | ``mask`` | any | | ++---+------------+----------+-----+------------+ +| 4 | END | END | ++---+-----------------------------+------------+ + + +``HASH`` +~~~~~~~~ + +There is no counterpart to this filter type because it translates to a +global device setting instead of a pattern item. Device settings are +automatically set according to the created flow rules. + +``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All packets are matched. This type alters incoming packets to encapsulate +them in a chosen tunnel type, optionally redirect them to a VF as well. + +The destination pool for tag based forwarding can be emulated with other +flow rules using `DUP`_ as the action. + ++----------------------------------------+ +| L2_TUNNEL | ++---------------------------+------------+ +| Pattern | Actions | ++===+======+==========+=====+============+ +| 0 | VOID | ``spec`` | N/A | VXLAN, | +| | | | | GENEVE, | +| | | | | ... | +| | +----------+-----+------------+ +| | | ``last`` | N/A | VF | +| | +----------+-----+ (optional) | +| | | ``mask`` | N/A | | +| | | | | | ++---+------+----------+-----+------------+ +| 1 | END | END | ++---+-----------------------+------------+ -- 2.1.4