From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9E1DCA0C40; Fri, 16 Apr 2021 12:50:04 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 82C79141D20; Fri, 16 Apr 2021 12:50:04 +0200 (CEST) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mails.dpdk.org (Postfix) with ESMTP id C8B51141D14 for ; Fri, 16 Apr 2021 12:50:02 +0200 (CEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 6A1145C0113; Fri, 16 Apr 2021 06:50:02 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Fri, 16 Apr 2021 06:50:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm1; bh= HY2r5RKkc9R2hrDrtnHxw2F5R6ic/J2Bbnx0IscFE+0=; b=xuene4kTIFMRZbIo bUUSIHq83TojsmoQ8PB8PczZX/JF69nwO5mUdt6LFEg/zxg/dhriYIlhish6RkR4 tcLP52iP/ptWFfCoWqQ3gTExKXEdLjmMn5iVW1n5JeRBwWWwLS2gggfAsQLR9fVW zH+ZvWGY46y8giYChOzLNVfGstgoOTj9z1mmVBJY+h/rF4Dw1A4vTlo2wmk1MoiH 4Xjp+WsjhR4VKXaOm/cX032QoUgPP1iK32TLmejwfrUT0W4y/fuYiijBKxgpmbMx vAhQ94XMQkAtEaPOylzVXuZwPoCNBdYmCiXAVdCBJPqZ5fdyz2iEojmskwF9La6e hhhQvg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=HY2r5RKkc9R2hrDrtnHxw2F5R6ic/J2Bbnx0IscFE +0=; b=FusdgrjRpBhRwIGHbeGFt8UiB0tF7n5CER45BSyrt6tx3+jKBXgWoGPtN 7MyBIBMXUtziZ85qva8r6O6+pb39SibrhCC6lHo1AtXAPWsZdqteADS8fF6PkhMK QSCwaDOK1NtNR7WIYKUVTgKZtx+smLAAWu93sHycdawPPptwolMzvxaHX/P1rhF4 /3wIfzR9KhwTis7m3A9SXFhCK8oAZF3n/zy9JvyUzgh+sETNDVRMqBniKh7+xzMg oy8EioRO4St/9BsgNMg8Zso72f6DFy0DFwrQmiecYxDDNlW8oBjULo9D3c7//tH/ x5FVHcFP+lQiyxZZEH6MpxudnGtLg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudelhedgfeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhephedvffefiefhhfekffejvdekgefggeejffevveejvdejtdfgtdej gfefhfdvgfeunecuffhomhgrihhnpegsohhothhlihhnrdgtohhmpdhushgvnhhigidroh hrghenucfkphepjeejrddufeegrddvtdefrddukeegnecuvehluhhsthgvrhfuihiivgep tdenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhhonhdrnh gvth X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 3A7551080063; Fri, 16 Apr 2021 06:50:00 -0400 (EDT) From: Thomas Monjalon To: Bing Zhao Cc: orika@nvidia.com, ferruh.yigit@intel.com, andrew.rybchenko@oktetlabs.ru, dev@dpdk.org, ajit.khaparde@broadcom.com, jerinj@marvell.com, humin29@huawei.com, rosen.xu@intel.com, hemant.agrawal@nxp.com Date: Fri, 16 Apr 2021 12:49:58 +0200 Message-ID: <2924323.ELig26xF5v@thomas> In-Reply-To: <1618504877-95597-2-git-send-email-bingz@nvidia.com> References: <1618062393-205611-1-git-send-email-bingz@nvidia.com> <1618504877-95597-1-git-send-email-bingz@nvidia.com> <1618504877-95597-2-git-send-email-bingz@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack flow action and item X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 15/04/2021 18:41, Bing Zhao: > This commit introduced the conntrack action and item. > > Usually the HW offloading is stateless. For some stateful offloading > like a TCP connection, HW module will help provide the ability of a > full offloading w/o SW participation after the connection was > established. > > The basic usage is that in the first flow the application should add > the conntrack action and in the following flow(s) the application > should use the conntrack item to match on the result. You probably mean "flow rule", not "traffic flow". Please make it clear to avoid confusion. > A TCP connection has two directions traffic. To set a conntrack > action context correctly, information from packets of both directions > are required. > > The conntrack action should be created on one port and supply the > peer port as a parameter to the action. After context creating, it > could only be used between the ports (dual-port mode) or a single > port. The application should modify the action via the API > "action_handle_update" only when before using it to create a flow > with opposite direction. This will help the driver to recognize the > direction of the flow to be created, especially in single port mode. > The traffic from both directions will go through the same port if > the application works as an "forwarding engine" but not a end point. > There is no need to call the update interface if the subsequent flows > have nothing to be changed. I am not sure this is a feature description for the commit log or an usage explanation for the doc. In any case, please distinguish "ethdev port" and "TCP port" to avoid confusion. > Query will be supported via action_ctx_query interface, about the > current packets information and connection status. Tha fields > query capabilities depends on the HW. > > For the packets received during the conntrack setup, it is suggested > to re-inject the packets in order to take full advantage of the What do you mean by "full advantage"? It is counter-intuitive to re-inject for offloading. Does it improve the performance? > conntrack. Only the valid packets should pass the conntrack, packets > with invalid TCP information, like out of window, or with invalid > header, like malformed, should not pass. > > Naming and definition: You mean naming is inspired from Linux? > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h > https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c > > Other reference: > https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf > > Signed-off-by: Bing Zhao [...] > + /** > + * [META] > + * > + * Matches conntrack state. > + * > + * See struct rte_flow_item_conntrack. Please use @see for hyperlink in doxygen. > + */ > + RTE_FLOW_ITEM_TYPE_CONNTRACK, > }; [...] > +/** > + * The packet is with valid state after conntrack checking. "is with valid state" looks strange. I propose "The packet is valid after conntrack checking." > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0) Please use RTE_BIT32(). > +/** > + * The state of the connection was changed. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1) > +/** > + * Error is detected on this packet for this connection and > + * an invalid state is set. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2) "INVAL" is strange. Can we add the missing 2 characters? RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVALID On a related note, do we really need the word FLAG? And it is conflicting with the prefix in enum rte_flow_conntrack_tcp_last_index I think RTE_FLOW_CONNTRACK_PKT_STATE_ is a good prefix, long enough. > +/** > + * The HW connection tracking module is disabled. > + * It can be due to application command or an invalid state. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3) This one does not have PKT in its name. And it is limiting to HW, while the driver could implement conntrack in SW. I propose RTE_FLOW_CONNTRACK_PKT_DISABLED > +/** > + * The packet contains some bad field(s) and cannot continue > + * with the conntrack module checking. > + */ > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4) > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ITEM_TYPE_CONNTRACK > + * > + * Matches the state of a packet after it passed the connection tracking > + * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG* s/bit mask/bitmap/ ? RTE_FLOW_CONNTRACK_PKT_STATE_* otherwise it is messed with rte_flow_conntrack_tcp_last_index > + * or a reasonable combination of these bits. > + */ > +struct rte_flow_item_conntrack { > + uint32_t flags; > +}; [...] > + > + /** > + * [META] > + * > + * Enable tracking a TCP connection state. > + * > + * Send packet to HW connection tracking module for examination. Not necessarily HW. No packet is sent. I think you can remove this sentence completely. > + * > + * See struct rte_flow_action_conntrack. @see > + */ > + RTE_FLOW_ACTION_TYPE_CONNTRACK, > }; > > /** > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp { > */ > struct rte_flow_action_handle; > > +/** > + * The state of a TCP connection. > + */ > +enum rte_flow_conntrack_state { > + /**< SYN-ACK packet was seen. */ > + RTE_FLOW_CONNTRACK_STATE_SYN_RECV, > + /**< 3-way handshark was done. */ s/handshark/handshake/ > + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED, > + /**< First FIN packet was received to close the connection. */ > + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT, > + /**< First FIN was ACKed. */ > + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT, > + /**< Second FIN was received, waiting for the last ACK. */ > + RTE_FLOW_CONNTRACK_STATE_LAST_ACK, > + /**< Second FIN was ACKed, connection was closed. */ > + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT, > +}; > + > +/** > + * The last passed TCP packet flags of a connection. > + */ > +enum rte_flow_conntrack_tcp_last_index { > + RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */ > + RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */ > + RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK flag. */ > + RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */ > + RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */ > + RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag. */ > +}; Please use RTE_BIT32(). > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * Configuration parameters for each direction of a TCP connection. > + */ > +struct rte_flow_tcp_dir_param { > + uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */ > + uint32_t close_initiated:1; /**< The FIN was sent by this direction. */ > + /**< An ACK packet has been received by this side. */ Move all comments on their own line before the struct member. Comment should then start with /** > + uint32_t last_ack_seen:1; > + /**< If set, indicates that there is unacked data of the connection. */ not sure what means "unacked data of the connection" > + uint32_t data_unacked:1; > + /**< Maximal value of sequence + payload length over sent > + * packets (next ACK from the opposite direction). > + */ > + uint32_t sent_end; > + /**< Maximal value of (ACK + window size) over received packet + length > + * over sent packet (maximal sequence could be sent). > + */ > + uint32_t reply_end; > + /**< Maximal value of actual window size over sent packets. */ > + uint32_t max_win; > + /**< Maximal value of ACK over sent packets. */ > + uint32_t max_ack; Not sure about the word "over" in above definitions. > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Configuration and initial state for the connection tracking module. > + * This structure could be used for both setting and query. > + */ > +struct rte_flow_action_conntrack { > + uint16_t peer_port; /**< The peer port number, can be the same port. */ > + /**< Direction of this connection when creating a flow, the value only > + * affects the subsequent flows creation. > + */ As for rte_flow_tcp_dir_param, better to move comments before, on their own line. > + uint32_t is_original_dir:1; > + /**< Enable / disable the conntrack HW module. When disabled, the > + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED. > + * In this state the HW will act as passthrough. > + * It only affects this conntrack object in the HW without any effect > + * to the other objects. > + */ > + uint32_t enable:1; > + /**< At least one ack was seen, after the connection was established. */ > + uint32_t live_connection:1; > + /**< Enable selective ACK on this connection. */ > + uint32_t selective_ack:1; > + /**< A challenge ack has passed. */ > + uint32_t challenge_ack_passed:1; > + /**< 1: The last packet is seen that comes from the original direction. > + * 0: From the reply direction. > + */ > + uint32_t last_direction:1; > + /**< No TCP check will be done except the state change. */ > + uint32_t liberal_mode:1; > + /**< The current state of the connection. */ > + enum rte_flow_conntrack_state state; > + /**< Scaling factor for maximal allowed ACK window. */ > + uint8_t max_ack_window; > + /**< Maximal allowed number of retransmission times. */ > + uint8_t retransmission_limit; > + /**< TCP parameters of the original direction. */ > + struct rte_flow_tcp_dir_param original_dir; > + /**< TCP parameters of the reply direction. */ > + struct rte_flow_tcp_dir_param reply_dir; > + /**< The window value of the last packet passed this conntrack. */ > + uint16_t last_window; > + enum rte_flow_conntrack_tcp_last_index last_index; > + /**< The sequence of the last packet passed this conntrack. */ > + uint32_t last_seq; > + /**< The acknowledgement of the last packet passed this conntrack. */ > + uint32_t last_ack; > + /**< The total value ACK + payload length of the last packet passed > + * this conntrack. > + */ > + uint32_t last_end; > +}; > + > +/** > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Wrapper structure for the context update interface. > + * Ports cannot support updating, and the only valid solution is to > + * destroy the old context and create a new one instead. > + */ > +struct rte_flow_modify_conntrack { > + /**< New connection tracking parameters to be updated. */ > + struct rte_flow_action_conntrack new_ct; > + uint32_t direction:1; /**< The direction field will be updated. */ > + /**< All the other fields except direction will be updated. */ > + uint32_t state:1; > + uint32_t reserved:30; /**< Reserved bits for the future usage. */ > +};