From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4C162A0A02; Fri, 16 Apr 2021 20:30:25 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D7AA4161D76; Fri, 16 Apr 2021 20:30:22 +0200 (CEST) Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by mails.dpdk.org (Postfix) with ESMTP id 90A34161D70 for ; Fri, 16 Apr 2021 20:30:21 +0200 (CEST) Received: by mail-qk1-f170.google.com with SMTP id d23so17783024qko.12 for ; Fri, 16 Apr 2021 11:30:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GFmncw47laoD5ti+yYM8ubiXUV2bziS27/Ksn+oRDbI=; b=Oib0hgSCY5JuaIfp/KKoxEML2Jd73IYmmX3bUULVv9EaWUmXZWlfzRlkAPJ0AoSv/E 3FLNshIAmE8QzgGo8Afs1z3qvkyL14tzxyxXpwb+ox1IQEBHHED7xn4RYIKKWslTScOJ 7Td035ZowdW9M4U0+gk7/fG/zeWpiiWK90Me8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GFmncw47laoD5ti+yYM8ubiXUV2bziS27/Ksn+oRDbI=; b=d9TzSFOvS7rW8bM+DfBAJzMXS1g0GETdop5APAq2qFQa7uNKg6rECLcUaE6aHMEzx4 YWL5x1+fmHmOW7vyf4DDT8VzFClKxqVBWnE2KaUXarfeagonE0jZiM8tV/slXA3qKqUY u5bQErROeHRLjZXYiyXXhfUjZWeoIGS6R8ZOpJkvAXs+nH87OkKXhMI+aMkfWDitDX1V si1bnduQvsR3flRwp4t6zrOcpCDu7/XRPXviTs8uL+yOpGpBLqqAyZ2yWyZajBaEvH37 k53jBAJNdzgs687NPnZa6UFDY+ONGa3mSg8wHr0VNNEk6qM+VX3Cr7+Pmp+P9rsWOyD0 iCmg== X-Gm-Message-State: AOAM531SAnq431U+VxeHFVs+VizrKYLfXR7IsLzRnZ4KfJJl5rxvYhDj nqX8R/rLa3dhwYoTD3UUC4LEb/bUvIarbDRyzfWgbA== X-Google-Smtp-Source: ABdhPJyakMV9qaws3AcKLhijJdI5A1CpKwEtJFgZqHoQ1fqPB+hscruQl9mwY382ratdgQSCaXnuJ9JEcozxxMZTAnw= X-Received: by 2002:a05:620a:2053:: with SMTP id d19mr632131qka.40.1618597820759; Fri, 16 Apr 2021 11:30:20 -0700 (PDT) MIME-Version: 1.0 References: <1618062393-205611-1-git-send-email-bingz@nvidia.com> <1618595649-157464-1-git-send-email-bingz@nvidia.com> <1618595649-157464-2-git-send-email-bingz@nvidia.com> In-Reply-To: <1618595649-157464-2-git-send-email-bingz@nvidia.com> From: Ajit Khaparde Date: Fri, 16 Apr 2021 11:30:04 -0700 Message-ID: To: Bing Zhao Cc: Ori Kam , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , dpdk-dev , Xiaoyun Li Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="0000000000002b1c8d05c01b2cb3" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH v3 1/3] ethdev: introduce conntrack flow action and item X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" --0000000000002b1c8d05c01b2cb3 Content-Type: text/plain; charset="UTF-8" On Fri, Apr 16, 2021 at 10:54 AM Bing Zhao wrote: > > This commit introduces the conntrack action and item. > > Usually the HW offloading is stateless. For some stateful offloading > like a TCP connection, HW module will help provide the ability of a > full offloading w/o SW participation after the connection was > established. > > The basic usage is that in the first flow rule the application should > add the conntrack action and jump to the next flow table. In the > following flow rule(s) of the next table, the application should use > the conntrack item to match on the result. > > A TCP connection has two directions traffic. To set a conntrack s/has two directions traffic/can have traffic in two directions. > action context correctly, the information of packets from both > directions are required. > > The conntrack action should be created on one ethdev port and supply > the peer ethdev port as a parameter to the action. After context > created, it could only be used between these two ethdev ports > (dual-port mode) or a single port. The application should modify the > action via the API "rte_action_handle_update" only when before using > it to create a flow rule with conntrack conntrack for the opposite > direction. This will help the driver to recognize the direction of > the flow to be created, especially in the single-port mode, in which > case the traffic from both directions will go through the same > ethdev port if the application works as an "forwarding engine" but > not an end point. There is no need to call the update interface if > the subsequent flow rules have nothing to be changed. > > Query will be supported via "rte_action_handle_query" interface, > about the current packets information and connection status. The > fields query capabilities depends on the HW. How about this: The fields which can be queried will depend on the HW capabilities. > > For the packets received during the conntrack setup, it is suggested > to re-inject the packets in order to make sure the conntrack module > works correctly without missing any packet. Only the valid packets > should pass the conntrack, packets with invalid TCP information, > like out of window, or with invalid header, like malformed, should > not pass. > > Naming and definition: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h > https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c > > Other reference: > https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf > > Signed-off-by: Bing Zhao > --- > lib/librte_ethdev/rte_flow.c | 2 + > lib/librte_ethdev/rte_flow.h | 207 +++++++++++++++++++++++++++++++++++ > 2 files changed, 209 insertions(+) > > diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c > index 0d2610b7c4..c7c7108933 100644 > --- a/lib/librte_ethdev/rte_flow.c > +++ b/lib/librte_ethdev/rte_flow.c > @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = { > MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)), > MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)), > MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)), > + MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)), > }; > > /** Generate flow_action[] entry. */ > @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = { > * indirect action handle. > */ > MK_FLOW_ACTION(INDIRECT, 0), > + MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)), > }; > > int > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h > index 324d00abdc..c9d7bdfa57 100644 > --- a/lib/librte_ethdev/rte_flow.h > +++ b/lib/librte_ethdev/rte_flow.h > @@ -551,6 +551,15 @@ enum rte_flow_item_type { > * See struct rte_flow_item_geneve_opt > */ > RTE_FLOW_ITEM_TYPE_GENEVE_OPT, > + > + /** > + * [META] > + * > + * Matches conntrack state. > + * > + * @see struct rte_flow_item_conntrack. > + */ > + RTE_FLOW_ITEM_TYPE_CONNTRACK, > }; > > /** > @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = { > }; > #endif > > +/** > + * The packet is valid after conntrack checking. > + */ > +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0) > +/** > + * The state of the connection is changed. > + */ > +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1) > +/** > + * Error is detected on this packet for this connection and > + * an invalid state is set. > + */ > +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2) > +/** > + * The HW connection tracking module is disabled. > + * It can be due to application command or an invalid state. > + */ > +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3) > +/** > + * The packet contains some bad field(s) and cannot continue > + * with the conntrack module checking. > + */ > +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4) > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ITEM_TYPE_CONNTRACK > + * > + * Matches the state of a packet after it passed the connection tracking > + * examination. The state is a bitmap of one RTE_FLOW_CONNTRACK_PKT_STATE* > + * or a reasonable combination of these bits. > + */ > +struct rte_flow_item_conntrack { > + uint32_t flags; > +}; > + > +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */ > +#ifndef __cplusplus > +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = { > + .flags = 0xffffffff, > +}; > +#endif > + > /** > * Matching pattern item definition. > * > @@ -2277,6 +2331,15 @@ enum rte_flow_action_type { > * same port or across different ports. > */ > RTE_FLOW_ACTION_TYPE_INDIRECT, > + > + /** > + * [META] > + * > + * Enable tracking a TCP connection state. > + * > + * @see struct rte_flow_action_conntrack. > + */ > + RTE_FLOW_ACTION_TYPE_CONNTRACK, > }; > > /** > @@ -2875,6 +2938,150 @@ struct rte_flow_action_set_dscp { > */ > struct rte_flow_action_handle; > > +/** > + * The state of a TCP connection. > + */ > +enum rte_flow_conntrack_state { > + /**< SYN-ACK packet was seen. */ > + RTE_FLOW_CONNTRACK_STATE_SYN_RECV, > + /**< 3-way handshake was done. */ > + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED, > + /**< First FIN packet was received to close the connection. */ > + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT, > + /**< First FIN was ACKed. */ > + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT, > + /**< Second FIN was received, waiting for the last ACK. */ > + RTE_FLOW_CONNTRACK_STATE_LAST_ACK, > + /**< Second FIN was ACKed, connection was closed. */ > + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT, > +}; > + > +/** > + * The last passed TCP packet flags of a connection. > + */ > +enum rte_flow_conntrack_tcp_last_index { > + RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */ > + RTE_FLOW_CONNTRACK_FLAG_SYN = RTE_BIT32(0), /**< With SYN flag. */ > + RTE_FLOW_CONNTRACK_FLAG_SYNACK = RTE_BIT32(1), /**< With SYNACK flag. */ > + RTE_FLOW_CONNTRACK_FLAG_FIN = RTE_BIT32(2), /**< With FIN flag. */ > + RTE_FLOW_CONNTRACK_FLAG_ACK = RTE_BIT32(3), /**< With ACK flag. */ > + RTE_FLOW_CONNTRACK_FLAG_RST = RTE_BIT32(4), /**< With RST flag. */ > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * Configuration parameters for each direction of a TCP connection. > + */ > +struct rte_flow_tcp_dir_param { > + /** TCP window scaling factor, 0xF to disable. */ > + uint32_t scale:4; > + /** The FIN was sent by this direction. */ > + uint32_t close_initiated:1; > + /** An ACK packet has been received by this side. */ > + uint32_t last_ack_seen:1; > + /** > + * If set, it indicates that there is unacknowledged data for the > + * packets sent from this direction. > + */ > + uint32_t data_unacked:1; > + /** > + * Maximal value of sequence + payload length in sent > + * packets (next ACK from the opposite direction). > + */ > + uint32_t sent_end; > + /** > + * Maximal value of (ACK + window size) in received packet + length > + * over sent packet (maximal sequence could be sent). > + */ > + uint32_t reply_end; > + /** Maximal value of actual window size in sent packets. */ > + uint32_t max_win; > + /** Maximal value of ACK in sent packets. */ > + uint32_t max_ack; > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this structure may change without prior notice > + * > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Configuration and initial state for the connection tracking module. > + * This structure could be used for both setting and query. > + */ > +struct rte_flow_action_conntrack { > + /** The peer port number, can be the same port. */ > + uint16_t peer_port; > + /** > + * Direction of this connection when creating a flow, the value > + * only affects the subsequent flows creation. s/flows/flow or s/the subsequent flows creation/the creation of subsequent flows > + */ > + uint32_t is_original_dir:1; > + /** > + * Enable / disable the conntrack HW module. When disabled, the > + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED. > + * In this state the HW will act as passthrough. > + * It only affects this conntrack object in the HW without any effect > + * to the other objects. > + */ > + uint32_t enable:1; > + /** At least one ack was seen after the connection was established. */ > + uint32_t live_connection:1; > + /** Enable selective ACK on this connection. */ > + uint32_t selective_ack:1; > + /** A challenge ack has passed. */ > + uint32_t challenge_ack_passed:1; > + /** > + * 1: The last packet is seen from the original direction. > + * 0: The last packet is seen from the reply direction. > + */ > + uint32_t last_direction:1; > + /** No TCP check will be done except the state change. */ > + uint32_t liberal_mode:1; > + /** + enum rte_flow_conntrack_state state; > + /** Scaling factor for maximal allowed ACK window. */ > + uint8_t max_ack_window; > + /** Maximal allowed number of retransmission times. */ s/times/limit > + uint8_t retransmission_limit; > + /** TCP parameters of the original direction. */ > + struct rte_flow_tcp_dir_param original_dir; > + /** TCP parameters of the reply direction. */ > + struct rte_flow_tcp_dir_param reply_dir; > + /** The window value of the last packet passed this conntrack. */ s/value/size > + uint16_t last_window; > + enum rte_flow_conntrack_tcp_last_index last_index; > + /** The sequence of the last packet passed this conntrack. */ sequence number of the ... > + uint32_t last_seq; > + /** The acknowledgement of the last packet passed this conntrack. */ ACK number of the.. s/passed this/passed by this or passing this > + uint32_t last_ack; > + /** > + * The total value ACK + payload length of the last packet > + * passed this conntrack. s/passed this/passed by this or passing this > + */ > + uint32_t last_end; > +}; > + > +/** > + * RTE_FLOW_ACTION_TYPE_CONNTRACK > + * > + * Wrapper structure for the context update interface. > + * Ports cannot support updating, and the only valid solution is to > + * destroy the old context and create a new one instead. > + */ > +struct rte_flow_modify_conntrack { > + /** New connection tracking parameters to be updated. */ > + struct rte_flow_action_conntrack new_ct; > + /** The direction field will be updated. */ > + uint32_t direction:1; > + /** All the other fields except direction will be updated. */ > + uint32_t state:1; > + /** Reserved bits for the future usage. */ > + uint32_t reserved:30; > +}; > + > /** > * Field IDs for MODIFY_FIELD action. > */ > -- > 2.19.0.windows.1 > --0000000000002b1c8d05c01b2cb3--