DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ajit Khaparde <ajit.khaparde@broadcom.com>
To: Bing Zhao <bingz@nvidia.com>
Cc: Ori Kam <orika@nvidia.com>, Thomas Monjalon <thomas@monjalon.net>,
	 Ferruh Yigit <ferruh.yigit@intel.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	dpdk-dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC] ethdev: introduce conntrack flow action and item
Date: Tue, 23 Mar 2021 16:27:16 -0700
Message-ID: <CACZ4nhs40BTNY3rUMUc6moJ20p17P6=BQR5VeM1nCLpCXAnGew@mail.gmail.com> (raw)
In-Reply-To: <1616052616-344969-1-git-send-email-bingz@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 10956 bytes --]

On Thu, Mar 18, 2021 at 12:30 AM Bing Zhao <bingz@nvidia.com> wrote:
>
> This commit introduced the conntrack action and item.
>
> Usually the HW offloading is stateless. For some stateful offloading
> like a TCP connection, HW module will help provide the ability of a
> full offloading w/o SW participation after the connection was
> established.
>
> The basic usage is that in the first flow the application should add
> the conntrack action and in the following flow(s) the application
> should use the conntrack item to match on the result.
>
> A TCP connection has two directions traffic. To set a conntrack
> action context correctly, information from packets of both directions
> are required.
>
> The conntrack action should be created on one port and supply the
> peer port as a parameter to the action. After context creating, it
> could only be used between the ports (dual-port mode) or a single
> port. The application should modify the action via action_ctx_update
> interface before each use in dual-port mode, in order to set the
> correct direction for the following rte flow.
>
> Query will be supported via action_ctx_query interface, about the
> current packets information and connection status.
>
> For the packets received during the conntrack setup, it is suggested
> to re-inject the packets in order to take full advantage of the
> conntrack. Only the valid packets should pass the conntrack, packets
> with invalid TCP information, like out of window, or with invalid
> header, like malformed, should not pass.
>
> Testpmd command line example:
>
> set conntrack [index] enable is 1 last_seq is xxx last ack is xxx /
> ... / orig_dir win_scale is xxx sent_end is xxx max_win is xxx ... /
> rply_dir ... / end
> flow action_ctx [CTX] create ingress ... / conntrack is [index] / end
> flow create 0 group X ingress patterns ... / tcp / end actions action_ctx [CTX]
> / jump group Y / end
> flow create 0 group Y ingress patterns ... / ct is [Valid] / end actions
> queue index [hairpin queue] / end
>
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---
>  lib/librte_ethdev/rte_flow.h | 191 +++++++++++++++++++++++++++++++++++
>  1 file changed, 191 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 669e677e91..b2e4f0751a 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -550,6 +550,15 @@ enum rte_flow_item_type {
>          * See struct rte_flow_item_geneve_opt
>          */
>         RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> +
> +       /**
> +        * [META]
> +        *
> +        * Matches conntrack state.
> +        *
> +        * See struct rte_flow_item_conntrack.
> +        */
> +       RTE_FLOW_ITEM_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -1654,6 +1663,49 @@ rte_flow_item_geneve_opt_mask = {
>  };
>  #endif
>
> +/**
> + * The packet is with valid.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_STATE_VALID (1 << 0)
> +/**
> + * The state of the connection was changed.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_STATE_CHANGED (1 << 1)
> +/**
> + * Error state was detected on this packet for this connection.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_ERROR (1 << 2)
> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_DISABLED (1 << 3)
> +/**
> + * The packet contains some bad field(s).
> + */
> +#define RTE_FLOW_CONNTRACK_FLAG_BAD_PKT (1 << 4)
Why not an enum? We could use the bits, but group them under an enum?

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> + *
> + * Matches the state of a packet after it passed the connection tracking
> + * examination. The state is a bit mask of one RTE_FLOW_CONNTRACK_FLAG*
> + * or a reasonable combination of these bits.
> + */
> +struct rte_flow_item_conntrack {
> +       uint32_t flags;
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = {
> +       .flags = 0xffffffff,
> +};
> +#endif
> +
>  /**
>   * Matching pattern item definition.
>   *
> @@ -2236,6 +2288,17 @@ enum rte_flow_action_type {
>          * See struct rte_flow_action_modify_field.
>          */
>         RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
> +
> +       /**
> +        * [META]
> +        *
> +        * Enable tracking a TCP connection state.
> +        *
> +        * Send packet to HW connection tracking module for examination.
> +        *
> +        * See struct rte_flow_action_conntrack.
> +        */
> +       RTE_FLOW_ACTION_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -2828,6 +2891,134 @@ struct rte_flow_action_set_dscp {
>   */
>  struct rte_flow_shared_action;
>
> +/**
> + * The state of a TCP connection.
> + */
> +enum rte_flow_conntrack_state {
> +       RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +       /**< SYN-ACK packet was seen. */
> +       RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +       /**< 3-way handshark was done. */
> +       RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +       /**< First FIN packet was received to close the connection. */
> +       RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +       /**< First FIN was ACKed. */
> +       RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +       /**< After second FIN, waiting for the last ACK. */
> +       RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +       /**< Second FIN was ACKed, connection was closed. */
> +};
> +
> +/**
> + * The last passed TCP packet flags of a connection.
> + */
> +enum rte_flow_conntrack_index {
> +       RTE_FLOW_CONNTRACK_INDEX_NONE = 0, /**< No Flag. */
> +       RTE_FLOW_CONNTRACK_INDEX_SYN = (1 << 0), /**< With SYN flag. */
> +       RTE_FLOW_CONNTRACK_INDEX_SYN_ACK = (1 << 1), /**< With SYN+ACK flag. */
> +       RTE_FLOW_CONNTRACK_INDEX_FIN = (1 << 2), /**< With FIN flag. */
> +       RTE_FLOW_CONNTRACK_INDEX_ACK = (1 << 3), /**< With ACK flag. */
> +       RTE_FLOW_CONNTRACK_INDEX_RST = (1 << 4), /**< With RST flag. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Configuration parameters for each direction of a TCP connection.
> + */
> +struct rte_flow_tcp_dir_param {
> +       uint32_t scale:4; /**< TCP window scaling factor, 0xF to disable. */
> +       uint32_t close_initiated:1; /**< The FIN was sent by this direction. */
> +       uint32_t last_ack_seen:1;
> +       /**< An ACK packet has been received by this side. */
> +       uint32_t data_unacked:1;
> +       /**< If set, indicates that there is unacked data of the connection. */
> +       uint32_t sent_end;
> +       /**< Maximal value of sequence + payload length over sent
> +        * packets (next ACK from the opposite direction).
> +        */
> +       uint32_t reply_end;
> +       /**< Maximal value of (ACK + window size) over received packet + length
> +        * over sent packet (maximal sequence could be sent).
> +        */
> +       uint32_t max_win;
> +       /**< Maximal value of actual window size over sent packets. */
> +       uint32_t max_ack;
> +       /**< Maximal value of ACK over sent packets. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Configuration and initial state for the connection tracking module.
> + * This structure could be used for both setting and query.
Can we split the structure into set and query.
Some of the fields seem to be relevant for a query.
Also the names will be simpler and easier to understand that way.

> + */
> +struct rte_flow_action_conntrack {
> +       uint16_t peer_port; /**< The peer port number, can be the same port. */
> +       uint32_t is_original_dir:1;
> +       /**< Direction of this connection when creating a flow, the value only
> +        * affects the subsequent flows creation.
> +        */
> +       uint32_t enable:1;
> +       /**< Enable / disable the conntrack HW module. When disabled, the
> +        * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> +        * In this state the HW will act as passthrough.
> +        */
We should be able to enable the block in HW implicitly based on the
rte_flow_create.
I don't think this is needed.

> +       uint32_t live_connection:1;
> +       /**< At least one ack was seen, after the connection was established. */
> +       uint32_t selective_ack:1;
> +       /**< Enable selective ACK on this connection. */
> +       uint32_t challenge_ack_passed:1;
> +       /**< A challenge ack has passed. */
> +       uint32_t last_direction:1;
> +       /**< 1: The last packet is seen that comes from the original direction.
> +        * 0: From the reply direction.
> +        */
> +       uint32_t liberal_mode:1;
> +       /**< No TCP check will be done except the state change. */
> +       enum rte_flow_conntrack_state state;
initial_state or cur_state?

> +       /**< The current state of the connection. */
> +       uint8_t max_ack_window;
> +       /**< Scaling factor for maximal allowed ACK window. */
> +       uint8_t retransmission_limit;
> +       /**< Maximal allowed number of retransmission times. */
> +       struct rte_flow_tcp_dir_param original_dir;
> +       /**< TCP parameters of the original direction. */
> +       struct rte_flow_tcp_dir_param reply_dir;
> +       /**< TCP parameters of the reply direction. */
> +       uint16_t last_window;
> +       /**< The window value of the last packet passed this conntrack. */
> +       enum rte_flow_conntrack_index last_index;
Do you mean rte_flow_conntrack_last_state - as in last state as seen
by HW block?
Or maybe it is the TCP flag and not state?

> +       uint32_t last_seq;
> +       /**< The sequence of the last packet passed this conntrack. */
> +       uint32_t last_ack;
> +       /**< The acknowledgement of the last packet passed this conntrack. */
> +       uint32_t last_end;
> +       /**< The total value ACK + payload length of the last packet passed
> +        * this conntrack.
> +        */
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Wrapper structure for the context update interface.
> + * Ports cannot support updating, and the only valid solution is to
> + * destroy the old context and create a new one instead.
> + */
In that case why not destroy the flow and create a new one?

> +struct rte_flow_modify_conntrack {
> +       struct rte_flow_action_conntrack new_ct;
> +       /**< New connection tracking parameters to be updated. */
> +       uint32_t direction:1; /**< The direction field will be updated. */
> +       uint32_t state:1;
> +       /**< All the other fields except direction will be updated. */
> +       uint32_t reserved:30; /**< Reserved bits for the future usage. */
> +};
> +
>  /**
>   * Field IDs for MODIFY_FIELD action.
>   */
> --
> 2.19.0.windows.1
>

  parent reply	other threads:[~2021-03-23 23:27 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18  7:30 Bing Zhao
2021-03-22 15:16 ` Andrew Rybchenko
2021-04-07  7:43   ` Bing Zhao
2021-03-23 23:27 ` Ajit Khaparde [this message]
2021-04-07  2:41   ` Bing Zhao
2021-04-10 13:46 ` [dpdk-dev] [PATCH] " Bing Zhao
2021-04-15 16:24   ` Ori Kam
2021-04-15 16:44     ` Bing Zhao
2021-04-15 16:41   ` [dpdk-dev] [PATCH v2 0/2] " Bing Zhao
2021-04-15 16:41     ` [dpdk-dev] [PATCH v2 1/2] " Bing Zhao
2021-04-16 10:49       ` Thomas Monjalon
2021-04-16 18:18         ` Bing Zhao
2021-04-16 12:41       ` Ori Kam
2021-04-16 18:05         ` Bing Zhao
2021-04-16 21:47           ` Ajit Khaparde
2021-04-17  6:10             ` Bing Zhao
2021-04-17 14:54               ` Ajit Khaparde
2021-04-15 16:41     ` [dpdk-dev] [PATCH v2 2/2] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-16  8:46       ` Ori Kam
2021-04-16 18:20         ` Bing Zhao
2021-04-16 17:54   ` [dpdk-dev] [PATCH v3 0/3] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-16 17:54     ` [dpdk-dev] [PATCH v3 1/3] " Bing Zhao
2021-04-16 18:30       ` Ajit Khaparde
2021-04-19 14:08         ` Thomas Monjalon
2021-04-19 16:21           ` Bing Zhao
2021-04-19 14:06       ` Thomas Monjalon
2021-04-19 16:13         ` Bing Zhao
2021-04-16 17:54     ` [dpdk-dev] [PATCH v3 2/3] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-16 17:54     ` [dpdk-dev] [PATCH v3 3/3] doc: update " Bing Zhao
2021-04-16 18:22       ` Thomas Monjalon
2021-04-16 18:30       ` Ajit Khaparde
2021-04-19 17:28         ` Bing Zhao
2021-04-19 17:16   ` [dpdk-dev] [PATCH v4 0/3] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-19 17:16     ` [dpdk-dev] [PATCH v4 1/3] " Bing Zhao
2021-04-19 17:33       ` Ori Kam
2021-04-19 17:16     ` [dpdk-dev] [PATCH v4 2/3] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-19 17:35       ` Ori Kam
2021-04-19 17:16     ` [dpdk-dev] [PATCH v4 3/3] doc: update " Bing Zhao
2021-04-19 17:32       ` Thomas Monjalon
2021-04-19 17:37       ` Ori Kam
2021-04-19 17:51   ` [dpdk-dev] [PATCH v5 0/2] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-19 17:51     ` [dpdk-dev] [PATCH v5 1/2] " Bing Zhao
2021-04-19 18:07       ` Thomas Monjalon
2021-04-19 23:29         ` Ferruh Yigit
2021-04-19 17:51     ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add CLI for conntrack Bing Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACZ4nhs40BTNY3rUMUc6moJ20p17P6=BQR5VeM1nCLpCXAnGew@mail.gmail.com' \
    --to=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=orika@nvidia.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git