From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id CFDF1A0A0A; Wed, 24 Mar 2021 00:27:36 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4CE0D40693; Wed, 24 Mar 2021 00:27:35 +0100 (CET) Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by mails.dpdk.org (Postfix) with ESMTP id 6DF9540684 for ; Wed, 24 Mar 2021 00:27:34 +0100 (CET) Received: by mail-qk1-f177.google.com with SMTP id q26so11049179qkm.6 for ; Tue, 23 Mar 2021 16:27:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lukBfCQli+xqozbUAs2c09wbEVDJ2gaF96ktjJ/hf2I=; b=INeFt1trckUban3EmbGFiXZUVO8ycVfQOXNk1TaVdomD3v4IQxuswBCXANXeb67N4+ ZKx8D7M76oy7YlCYUpbqHnkAd2A2qULGiwE1WiJArMqxvIiDYZC+qKI/FmJ+O8yxVApn lr6iPbTtkGICnHKckwaxHqT5gJhiPGKvO54Vo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lukBfCQli+xqozbUAs2c09wbEVDJ2gaF96ktjJ/hf2I=; b=Uq2b0rN9d13fQVYDoDbL64THCsz289PSMnjSAt5fdwqIE6FXXfNHmbY4T1yDnh6wsH WmM5x41mF8pXzWOY7oL3HP3LZA6YwGMAxaDOXvhvSiApMiRAlDLY3VZnXaLElrf2TX2b TzsnRVLmUMl8pFnbkXFhgpm+SvYbFHYDsuDT5x3s4bhe+8B6/hn1InQo27/NrHtZ9wJh Ais+MHmR7LRFRYG+eE9/IgKJpzl/7DEy1IMejL42FgdSvlblNCNGB13XD8UxHI5z2p86 CZ3BiOd9UJk620jmr0LtOCuaou3rkP9r8aZsCw9bPUW6TUK0stuWiQ2Tlcm//zCqQKHE SMMQ== X-Gm-Message-State: AOAM530X018nnd0pdxu2bm5EGipLOL5uqAG0EVWr7RziH9N8MC9w51mx L8/O66KSEj8P8wR48n3ox7EFUzdUc1QEvFN1a5l4sg== X-Google-Smtp-Source: ABdhPJxkxTyrm9XvFAhGmt1spkRB55t+xHqAHM+LaZn9dk+YaK8XTGbvSQ+ZSowzInZcl9IGQ/bQMGWaJ5Qqkh3xpqY= X-Received: by 2002:a37:43cc:: with SMTP id q195mr506521qka.455.1616542053437; Tue, 23 Mar 2021 16:27:33 -0700 (PDT) MIME-Version: 1.0 References: <1616052616-344969-1-git-send-email-bingz@nvidia.com> In-Reply-To: <1616052616-344969-1-git-send-email-bingz@nvidia.com> From: Ajit Khaparde Date: Tue, 23 Mar 2021 16:27:16 -0700 Message-ID: To: Bing Zhao Cc: Ori Kam , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , dpdk-dev Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="000000000000e4991b05be3c865e" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [RFC] ethdev: introduce conntrack flow action and item X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" --000000000000e4991b05be3c865e Content-Type: text/plain; charset="UTF-8" On Thu, Mar 18, 2021 at 12:30 AM Bing Zhao wrote: > > This commit introduced the conntrack action and item. > > Usually the HW offloading is stateless. For some stateful offloading > like a TCP connection, HW module will help provide the ability of a > full offloading w/o SW participation after the connection was > established. > > The basic usage is that in the first flow the application should add > the conntrack action and in the following flow(s) the application > should use the conntrack item to match on the result. > > A TCP connection has two directions traffic. To set a conntrack > action context correctly, information from packets of both directions > are required. > > The conntrack action should be created on one port and supply the > peer port as a parameter to the action. After context creating, it > could only be used between the ports (dual-port mode) or a single > port. The application should modify the action via action_ctx_update > interface before each use in dual-port mode, in order to set the > correct direction for the following rte flow. > > Query will be supported via action_ctx_query interface, about the > current packets information and connection status. > > For the packets received during the conntrack setup, it is suggested > to re-inject the packets in order to take full advantage of the > conntrack. Only the valid packets should pass the conntrack, packets > with invalid TCP information, like out of window, or with invalid > header, like malformed, should not pass. > > Testpmd command line example: > > set conntrack [index] enable is 1 last_seq is xxx last ack is xxx / > ... / orig_dir win_scale is xxx sent_end is xxx max_win is xxx ... / > rply_dir ... / end > flow action_ctx [CTX] create ingress ... / conntrack is [index] / end > flow create 0 group X ingress patterns ... / tcp / end actions action_ctx [CTX] > / jump group Y / end > flow create 0 group Y ingress patterns ... / ct is [Valid] / end actions > queue index [hairpin queue] / end > > Signed-off-by: Bing Zhao > --- > lib/librte_ethdev/rte_flow.h | 191 +++++++++++++++++++++++++++++++++++ > 1 file changed, 191 insertions(+) > > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h > index 669e677e91..b2e4f0751a 100644 > --- a/lib/librte_ethdev/rte_flow.h > +++ b/lib/librte_ethdev/rte_flow.h > @@ -550,6 +550,15 @@ enum rte_flow_item_type { > * See struct rte_flow_item_geneve_opt > */ > RTE_FLOW_ITEM_TYPE_GENEVE_OPT, > + > + /** > + * [META] > + * > + * Matches conntrack state. > + * > + * See struct rte_flow_item_conntrack. > + */ > + RTE_FLOW_ITEM_TYPE_CONNTRACK, > }; > > /** > @@ -1654,6 +1663,49 @@ rte_flow_item_geneve_opt_mask = { > }; > #endif > > +/** > + * The packet is with valid. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_STATE_VALID (1 << 0) > +/** > + * The state of the connection was changed. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_STATE_CHANGED (1 << 1) > +/** > + * Error state was detected on this packet for this connection. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_ERROR (1 << 2) > +/** > + * The HW connection tracking module is disabled. > + * It can be due to application command or an invalid state. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_DISABLED (1 << 3) > +/** > + * The packet contains some bad field(s). > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_BAD_PKT (1 << 4) Why not an enum? We could use the bits, but group them under an enum? > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ITEM_TYPE_CONNTRACK > + * > + * Matches the state of a packet after it passed the connection tracking > + * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG* > + * or a reasonable combination of these bits. > + */ > +struct rte_flow_item_conntrack { > + uint32_t flags; > +}; > + > +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */ > +#ifndef __cplusplus > +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = { > + .flags = 0xffffffff, > +}; > +#endif > + > /** > * Matching pattern item definition. > * > @@ -2236,6 +2288,17 @@ enum rte_flow_action_type { > * See struct rte_flow_action_modify_field. > */ > RTE_FLOW_ACTION_TYPE_MODIFY_FIELD, > + > + /** > + * [META] > + * > + * Enable tracking a TCP connection state. > + * > + * Send packet to HW connection tracking module for examination. > + * > + * See struct rte_flow_action_conntrack. > + */ > + RTE_FLOW_ACTION_TYPE_CONNTRACK, > }; > > /** > @@ -2828,6 +2891,134 @@ struct rte_flow_action_set_dscp { > */ > struct rte_flow_shared_action; > > +/** > + * The state of a TCP connection. > + */ > +enum rte_flow_conntrack_state { > + RTE_FLOW_CONNTRACK_STATE_SYN_RECV, > + /**< SYN-ACK packet was seen. */ > + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED, > + /**< 3-way handshark was done. */ > + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT, > + /**< First FIN packet was received to close the connection. */ > + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT, > + /**< First FIN was ACKed. */ > + RTE_FLOW_CONNTRACK_STATE_LAST_ACK, > + /**< After second FIN, waiting for the last ACK. */ > + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT, > + /**< Second FIN was ACKed, connection was closed. */ > +}; > + > +/** > + * The last passed TCP packet flags of a connection. > + */ > +enum rte_flow_conntrack_index { > + RTE_FLOW_CONNTRACK_INDEX_NONE = 0, /**< No Flag. */ > + RTE_FLOW_CONNTRACK_INDEX_SYN = (1 << 0), /**< With SYN flag. */ > + RTE_FLOW_CONNTRACK_INDEX_SYN_ACK = (1 << 1), /**< With SYN+ACK flag. */ > + RTE_FLOW_CONNTRACK_INDEX_FIN = (1 << 2), /**< With FIN flag. */ > + RTE_FLOW_CONNTRACK_INDEX_ACK = (1 << 3), /**< With ACK flag. */ > + RTE_FLOW_CONNTRACK_INDEX_RST = (1 << 4), /**< With RST flag. */ > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * Configuration parameters for each direction of a TCP connection. > + */ > +struct rte_flow_tcp_dir_param { > + uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */ > + uint32_t close_initiated:1; /**< The FIN was sent by this direction. */ > + uint32_t last_ack_seen:1; > + /**< An ACK packet has been received by this side. */ > + uint32_t data_unacked:1; > + /**< If set, indicates that there is unacked data of the connection. */ > + uint32_t sent_end; > + /**< Maximal value of sequence + payload length over sent > + * packets (next ACK from the opposite direction). > + */ > + uint32_t reply_end; > + /**< Maximal value of (ACK + window size) over received packet + length > + * over sent packet (maximal sequence could be sent). > + */ > + uint32_t max_win; > + /**< Maximal value of actual window size over sent packets. */ > + uint32_t max_ack; > + /**< Maximal value of ACK over sent packets. */ > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Configuration and initial state for the connection tracking module. > + * This structure could be used for both setting and query. Can we split the structure into set and query. Some of the fields seem to be relevant for a query. Also the names will be simpler and easier to understand that way. > + */ > +struct rte_flow_action_conntrack { > + uint16_t peer_port; /**< The peer port number, can be the same port. */ > + uint32_t is_original_dir:1; > + /**< Direction of this connection when creating a flow, the value only > + * affects the subsequent flows creation. > + */ > + uint32_t enable:1; > + /**< Enable / disable the conntrack HW module. When disabled, the > + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED. > + * In this state the HW will act as passthrough. > + */ We should be able to enable the block in HW implicitly based on the rte_flow_create. I don't think this is needed. > + uint32_t live_connection:1; > + /**< At least one ack was seen, after the connection was established. */ > + uint32_t selective_ack:1; > + /**< Enable selective ACK on this connection. */ > + uint32_t challenge_ack_passed:1; > + /**< A challenge ack has passed. */ > + uint32_t last_direction:1; > + /**< 1: The last packet is seen that comes from the original direction. > + * 0: From the reply direction. > + */ > + uint32_t liberal_mode:1; > + /**< No TCP check will be done except the state change. */ > + enum rte_flow_conntrack_state state; initial_state or cur_state? > + /**< The current state of the connection. */ > + uint8_t max_ack_window; > + /**< Scaling factor for maximal allowed ACK window. */ > + uint8_t retransmission_limit; > + /**< Maximal allowed number of retransmission times. */ > + struct rte_flow_tcp_dir_param original_dir; > + /**< TCP parameters of the original direction. */ > + struct rte_flow_tcp_dir_param reply_dir; > + /**< TCP parameters of the reply direction. */ > + uint16_t last_window; > + /**< The window value of the last packet passed this conntrack. */ > + enum rte_flow_conntrack_index last_index; Do you mean rte_flow_conntrack_last_state - as in last state as seen by HW block? Or maybe it is the TCP flag and not state? > + uint32_t last_seq; > + /**< The sequence of the last packet passed this conntrack. */ > + uint32_t last_ack; > + /**< The acknowledgement of the last packet passed this conntrack. */ > + uint32_t last_end; > + /**< The total value ACK + payload length of the last packet passed > + * this conntrack. > + */ > +}; > + > +/** > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Wrapper structure for the context update interface. > + * Ports cannot support updating, and the only valid solution is to > + * destroy the old context and create a new one instead. > + */ In that case why not destroy the flow and create a new one? > +struct rte_flow_modify_conntrack { > + struct rte_flow_action_conntrack new_ct; > + /**< New connection tracking parameters to be updated. */ > + uint32_t direction:1; /**< The direction field will be updated. */ > + uint32_t state:1; > + /**< All the other fields except direction will be updated. */ > + uint32_t reserved:30; /**< Reserved bits for the future usage. */ > +}; > + > /** > * Field IDs for MODIFY_FIELD action. > */ > -- > 2.19.0.windows.1 > --000000000000e4991b05be3c865e--