From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9E08AA0524; Mon, 19 Apr 2021 19:51:45 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8B8E34142E; Mon, 19 Apr 2021 19:51:45 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by mails.dpdk.org (Postfix) with ESMTP id 59D3F4142E for ; Mon, 19 Apr 2021 19:51:44 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from bingz@nvidia.com) with SMTP; 19 Apr 2021 20:51:40 +0300 Received: from nvidia.com (mtbc-r640-01.mtbc.labs.mlnx [10.75.70.6]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 13JHpZVf007797; Mon, 19 Apr 2021 20:51:38 +0300 From: Bing Zhao To: orika@nvidia.com, thomas@monjalon.net, ferruh.yigit@intel.com, andrew.rybchenko@oktetlabs.ru Cc: dev@dpdk.org, ajit.khaparde@broadcom.com, xiaoyun.li@intel.com Date: Tue, 20 Apr 2021 01:51:30 +0800 Message-Id: <1618854691-370765-2-git-send-email-bingz@nvidia.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1618854691-370765-1-git-send-email-bingz@nvidia.com> References: <1618062393-205611-1-git-send-email-bingz@nvidia.com> <1618854691-370765-1-git-send-email-bingz@nvidia.com> Subject: [dpdk-dev] [PATCH v5 1/2] ethdev: introduce conntrack flow action and item X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit introduces the conntrack action and item. Usually the HW offloading is stateless. For some stateful offloading like a TCP connection, HW module will help provide the ability of a full offloading w/o SW participation after the connection was established. The basic usage is that in the first flow rule the application should add the conntrack action and jump to the next flow table. In the following flow rule(s) of the next table, the application should use the conntrack item to match on the result. A TCP connection has two directions traffic. To set a conntrack action context correctly, the information of packets from both directions are required. The conntrack action should be created on one ethdev port and supply the peer ethdev port as a parameter to the action. After context created, it could only be used between these two ethdev ports (dual-port mode) or a single port. The application should modify the action via the API "rte_action_handle_update" only when before using it to create a flow rule with conntrack for the opposite direction. This will help the driver to recognize the direction of the flow to be created, especially in the single-port mode, in which case the traffic from both directions will go through the same ethdev port if the application works as an "forwarding engine" but not an end point. There is no need to call the update interface if the subsequent flow rules have nothing to be changed. Query will be supported via "rte_action_handle_query" interface, about the current packets information and connection status. The fields query capabilities depends on the HW. For the packets received during the conntrack setup, it is suggested to re-inject the packets in order to make sure the conntrack module works correctly without missing any packet. Only the valid packets should pass the conntrack, packets with invalid TCP information, like out of window, or with invalid header, like malformed, should not pass. Naming and definition: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ netfilter/nf_conntrack_tcp.h https://elixir.bootlin.com/linux/latest/source/net/netfilter/ nf_conntrack_proto_tcp.c Other reference: https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf Signed-off-by: Bing Zhao Acked-by: Ori Kam --- doc/guides/prog_guide/rte_flow.rst | 118 ++++++++++++++ doc/guides/rel_notes/release_21_05.rst | 4 + lib/librte_ethdev/rte_flow.c | 2 + lib/librte_ethdev/rte_flow.h | 212 +++++++++++++++++++++++++ 4 files changed, 336 insertions(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index 4b54588995..5f6129f799 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -1398,6 +1398,14 @@ Matches a eCPRI header. - ``hdr``: eCPRI header definition (``rte_ecpri.h``). - Default ``mask`` matches nothing, for all eCPRI messages. +Item: ``CONNTRACK`` +^^^^^^^^^^^^^^^^^^^ + +Matches a conntrack state after conntrack action. + +- ``flags``: conntrack packet state flags. +- Default ``mask`` matches all state bits. + Actions ~~~~~~~ @@ -2842,6 +2850,116 @@ for ``RTE_FLOW_FIELD_VALUE`` and ``RTE_FLOW_FIELD_POINTER`` respectively. | ``value`` | immediate value or a pointer to this value | +---------------+----------------------------------------------------------+ +Action: ``CONNTRACK`` +^^^^^^^^^^^^^^^^^^^^^ + +Create a conntrack (connection tracking) context with the provided information. + +In stateful session like TCP, the conntrack action provides the ability to +examine every packet of this connection and associate the state to every +packet. It will help to realize the stateful offload of connections with little +software participation. For example, the packets with invalid state may be +handled by the software. The control packets could be handled in the hardware. +The software just need to query the state of a connection when needed, and then +decide how to handle the flow rules and conntrack context. + +A conntrack context should be created via ``rte_flow_action_handle_create()`` +before using. Then the handle with ``INDIRECT`` type is used for a flow rule +creation. If a flow rule with an opposite direction needs to be created, the +``rte_flow_action_handle_update()`` should be used to modify the direction. + +Not all the fields of the ``struct rte_flow_action_conntrack`` will be used +for a conntrack context creating, depending on the HW, and they should be +in host byte order. PMD should convert them into network byte order when +needed by the HW. + +The ``struct rte_flow_modify_conntrack`` should be used for an updating. + +The current conntrack context information could be queried via the +``rte_flow_action_handle_query()`` interface. + +.. _table_rte_flow_action_conntrack: + +.. table:: CONNTRACK + + +--------------------------+-------------------------------------------------------------+ + | Field | Value | + +==========================+=============================================================+ + | ``peer_port`` | peer port number | + +--------------------------+-------------------------------------------------------------+ + | ``is_original_dir`` | direction of this connection for creating flow rule | + +--------------------------+-------------------------------------------------------------+ + | ``enable`` | enable the conntrack context | + +--------------------------+-------------------------------------------------------------+ + | ``live_connection`` | one ack was seen for this connection | + +--------------------------+-------------------------------------------------------------+ + | ``selective_ack`` | SACK enabled | + +--------------------------+-------------------------------------------------------------+ + | ``challenge_ack_passed`` | a challenge ack has passed | + +--------------------------+-------------------------------------------------------------+ + | ``last_direction`` | direction of the last passed packet | + +--------------------------+-------------------------------------------------------------+ + | ``liberal_mode`` | only report state change | + +--------------------------+-------------------------------------------------------------+ + | ``state`` | current state | + +--------------------------+-------------------------------------------------------------+ + | ``max_ack_window`` | maximal window scaling factor | + +--------------------------+-------------------------------------------------------------+ + | ``retransmission_limit`` | maximal retransmission times | + +--------------------------+-------------------------------------------------------------+ + | ``original_dir`` | TCP parameters of the original direction | + +--------------------------+-------------------------------------------------------------+ + | ``reply_dir`` | TCP parameters of the reply direction | + +--------------------------+-------------------------------------------------------------+ + | ``last_window`` | window size of the last passed packet | + +--------------------------+-------------------------------------------------------------+ + | ``last_seq`` | sequence number of the last passed packet | + +--------------------------+-------------------------------------------------------------+ + | ``last_ack`` | acknowledgment number the last passed packet | + +--------------------------+-------------------------------------------------------------+ + | ``last_end`` | sum of ack number and length of the last passed packet | + +--------------------------+-------------------------------------------------------------+ + +.. _table_rte_flow_tcp_dir_param: + +.. table:: configuration parameters for each direction + + +---------------------+---------------------------------------------------------+ + | Field | Value | + +=====================+=========================================================+ + | ``scale`` | TCP window scaling factor | + +---------------------+---------------------------------------------------------+ + | ``close_initiated`` | FIN sent from this direction | + +---------------------+---------------------------------------------------------+ + | ``last_ack_seen`` | an ACK packet received | + +---------------------+---------------------------------------------------------+ + | ``data_unacked`` | unacknowledged data for packets from this direction | + +---------------------+---------------------------------------------------------+ + | ``sent_end`` | max{seq + len} seen in sent packets | + +---------------------+---------------------------------------------------------+ + | ``reply_end`` | max{sack + max{win, 1}} seen in reply packets | + +---------------------+---------------------------------------------------------+ + | ``max_win`` | max{max{win, 1}} + {sack - ack} seen in sent packets | + +---------------------+---------------------------------------------------------+ + | ``max_ack`` | max{ack} + seen in sent packets | + +---------------------+---------------------------------------------------------+ + +.. _table_rte_flow_modify_conntrack: + +.. table:: update a conntrack context + + +----------------+-------------------------------------------------+ + | Field | Value | + +================+=================================================+ + | ``new_ct`` | new conntrack information | + +----------------+-------------------------------------------------+ + | ``direction`` | direction will be updated | + +----------------+-------------------------------------------------+ + | ``state`` | other fields except direction will be updated | + +----------------+-------------------------------------------------+ + | ``reserved`` | reserved bits | + +----------------+-------------------------------------------------+ + Negative types ~~~~~~~~~~~~~~ diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst index 8913dd4f9c..a5e2a8e503 100644 --- a/doc/guides/rel_notes/release_21_05.rst +++ b/doc/guides/rel_notes/release_21_05.rst @@ -87,6 +87,10 @@ New Features to support metering traffic by packet per second (PPS), in addition to the initial bytes per second (BPS) mode (value 0). +* **Added TCP connection tracking offload in flow API.** + + * Added conntrack item and action for stateful connection offload. + * **Updated Arkville PMD driver.** Updated Arkville net driver with new features and improvements, including: diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c index 0d2610b7c4..c7c7108933 100644 --- a/lib/librte_ethdev/rte_flow.c +++ b/lib/librte_ethdev/rte_flow.c @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = { MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)), MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)), MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)), + MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)), }; /** Generate flow_action[] entry. */ @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = { * indirect action handle. */ MK_FLOW_ACTION(INDIRECT, 0), + MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)), }; int diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h index 0447d36002..dae16b3433 100644 --- a/lib/librte_ethdev/rte_flow.h +++ b/lib/librte_ethdev/rte_flow.h @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -551,6 +552,15 @@ enum rte_flow_item_type { * See struct rte_flow_item_geneve_opt */ RTE_FLOW_ITEM_TYPE_GENEVE_OPT, + + /** + * [META] + * + * Matches conntrack state. + * + * @see struct rte_flow_item_conntrack. + */ + RTE_FLOW_ITEM_TYPE_CONNTRACK, }; /** @@ -1685,6 +1695,51 @@ rte_flow_item_geneve_opt_mask = { }; #endif +/** + * The packet is valid after conntrack checking. + */ +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0) +/** + * The state of the connection is changed. + */ +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1) +/** + * Error is detected on this packet for this connection and + * an invalid state is set. + */ +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2) +/** + * The HW connection tracking module is disabled. + * It can be due to application command or an invalid state. + */ +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3) +/** + * The packet contains some bad field(s) and cannot continue + * with the conntrack module checking. + */ +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4) + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + * + * RTE_FLOW_ITEM_TYPE_CONNTRACK + * + * Matches the state of a packet after it passed the connection tracking + * examination. The state is a bitmap of one RTE_FLOW_CONNTRACK_PKT_STATE* + * or a reasonable combination of these bits. + */ +struct rte_flow_item_conntrack { + uint32_t flags; +}; + +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */ +#ifndef __cplusplus +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = { + .flags = 0xffffffff, +}; +#endif + /** * Matching pattern item definition. * @@ -2278,6 +2333,15 @@ enum rte_flow_action_type { * or different ethdev ports. */ RTE_FLOW_ACTION_TYPE_INDIRECT, + + /** + * [META] + * + * Enable tracking a TCP connection state. + * + * @see struct rte_flow_action_conntrack. + */ + RTE_FLOW_ACTION_TYPE_CONNTRACK, }; /** @@ -2876,6 +2940,154 @@ struct rte_flow_action_set_dscp { */ struct rte_flow_action_handle; +/** + * The state of a TCP connection. + */ +enum rte_flow_conntrack_state { + /** SYN-ACK packet was seen. */ + RTE_FLOW_CONNTRACK_STATE_SYN_RECV, + /** 3-way handshake was done. */ + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED, + /** First FIN packet was received to close the connection. */ + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT, + /** First FIN was ACKed. */ + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT, + /** Second FIN was received, waiting for the last ACK. */ + RTE_FLOW_CONNTRACK_STATE_LAST_ACK, + /** Second FIN was ACKed, connection was closed. */ + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT, +}; + +/** + * The last passed TCP packet flags of a connection. + */ +enum rte_flow_conntrack_tcp_last_index { + RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */ + RTE_FLOW_CONNTRACK_FLAG_SYN = RTE_BIT32(0), /**< With SYN flag. */ + RTE_FLOW_CONNTRACK_FLAG_SYNACK = RTE_BIT32(1), /**< With SYNACK flag. */ + RTE_FLOW_CONNTRACK_FLAG_FIN = RTE_BIT32(2), /**< With FIN flag. */ + RTE_FLOW_CONNTRACK_FLAG_ACK = RTE_BIT32(3), /**< With ACK flag. */ + RTE_FLOW_CONNTRACK_FLAG_RST = RTE_BIT32(4), /**< With RST flag. */ +}; + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + * + * Configuration parameters for each direction of a TCP connection. + * All fields should be in host byte order. + * If needed, driver should convert all fields to network byte order + * if HW needs them in that way. + */ +struct rte_flow_tcp_dir_param { + /** TCP window scaling factor, 0xF to disable. */ + uint32_t scale:4; + /** The FIN was sent by this direction. */ + uint32_t close_initiated:1; + /** An ACK packet has been received by this side. */ + uint32_t last_ack_seen:1; + /** + * If set, it indicates that there is unacknowledged data for the + * packets sent from this direction. + */ + uint32_t data_unacked:1; + /** + * Maximal value of sequence + payload length in sent + * packets (next ACK from the opposite direction). + */ + uint32_t sent_end; + /** + * Maximal value of (ACK + window size) in received packet + length + * over sent packet (maximal sequence could be sent). + */ + uint32_t reply_end; + /** Maximal value of actual window size in sent packets. */ + uint32_t max_win; + /** Maximal value of ACK in sent packets. */ + uint32_t max_ack; +}; + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + * + * RTE_FLOW_ACTION_TYPE_CONNTRACK + * + * Configuration and initial state for the connection tracking module. + * This structure could be used for both setting and query. + * All fields should be in host byte order. + */ +struct rte_flow_action_conntrack { + /** The peer port number, can be the same port. */ + uint16_t peer_port; + /** + * Direction of this connection when creating a flow rule, the + * value only affects the creation of subsequent flow rules. + */ + uint32_t is_original_dir:1; + /** + * Enable / disable the conntrack HW module. When disabled, the + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED. + * In this state the HW will act as passthrough. + * It only affects this conntrack object in the HW without any effect + * to the other objects. + */ + uint32_t enable:1; + /** At least one ack was seen after the connection was established. */ + uint32_t live_connection:1; + /** Enable selective ACK on this connection. */ + uint32_t selective_ack:1; + /** A challenge ack has passed. */ + uint32_t challenge_ack_passed:1; + /** + * 1: The last packet is seen from the original direction. + * 0: The last packet is seen from the reply direction. + */ + uint32_t last_direction:1; + /** No TCP check will be done except the state change. */ + uint32_t liberal_mode:1; + /**