From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 4C162A0A02;
	Fri, 16 Apr 2021 20:30:25 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id D7AA4161D76;
	Fri, 16 Apr 2021 20:30:22 +0200 (CEST)
Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com
 [209.85.222.170])
 by mails.dpdk.org (Postfix) with ESMTP id 90A34161D70
 for <dev@dpdk.org>; Fri, 16 Apr 2021 20:30:21 +0200 (CEST)
Received: by mail-qk1-f170.google.com with SMTP id d23so17783024qko.12
 for <dev@dpdk.org>; Fri, 16 Apr 2021 11:30:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=GFmncw47laoD5ti+yYM8ubiXUV2bziS27/Ksn+oRDbI=;
 b=Oib0hgSCY5JuaIfp/KKoxEML2Jd73IYmmX3bUULVv9EaWUmXZWlfzRlkAPJ0AoSv/E
 3FLNshIAmE8QzgGo8Afs1z3qvkyL14tzxyxXpwb+ox1IQEBHHED7xn4RYIKKWslTScOJ
 7Td035ZowdW9M4U0+gk7/fG/zeWpiiWK90Me8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=GFmncw47laoD5ti+yYM8ubiXUV2bziS27/Ksn+oRDbI=;
 b=d9TzSFOvS7rW8bM+DfBAJzMXS1g0GETdop5APAq2qFQa7uNKg6rECLcUaE6aHMEzx4
 YWL5x1+fmHmOW7vyf4DDT8VzFClKxqVBWnE2KaUXarfeagonE0jZiM8tV/slXA3qKqUY
 u5bQErROeHRLjZXYiyXXhfUjZWeoIGS6R8ZOpJkvAXs+nH87OkKXhMI+aMkfWDitDX1V
 si1bnduQvsR3flRwp4t6zrOcpCDu7/XRPXviTs8uL+yOpGpBLqqAyZ2yWyZajBaEvH37
 k53jBAJNdzgs687NPnZa6UFDY+ONGa3mSg8wHr0VNNEk6qM+VX3Cr7+Pmp+P9rsWOyD0
 iCmg==
X-Gm-Message-State: AOAM531SAnq431U+VxeHFVs+VizrKYLfXR7IsLzRnZ4KfJJl5rxvYhDj
 nqX8R/rLa3dhwYoTD3UUC4LEb/bUvIarbDRyzfWgbA==
X-Google-Smtp-Source: ABdhPJyakMV9qaws3AcKLhijJdI5A1CpKwEtJFgZqHoQ1fqPB+hscruQl9mwY382ratdgQSCaXnuJ9JEcozxxMZTAnw=
X-Received: by 2002:a05:620a:2053:: with SMTP id
 d19mr632131qka.40.1618597820759; 
 Fri, 16 Apr 2021 11:30:20 -0700 (PDT)
MIME-Version: 1.0
References: <1618062393-205611-1-git-send-email-bingz@nvidia.com>
 <1618595649-157464-1-git-send-email-bingz@nvidia.com>
 <1618595649-157464-2-git-send-email-bingz@nvidia.com>
In-Reply-To: <1618595649-157464-2-git-send-email-bingz@nvidia.com>
From: Ajit Khaparde <ajit.khaparde@broadcom.com>
Date: Fri, 16 Apr 2021 11:30:04 -0700
Message-ID: <CACZ4nhtJChgZFvRu1qU8ss6LHJpkgdtc4vwPVEg+yqaSvs4riA@mail.gmail.com>
To: Bing Zhao <bingz@nvidia.com>
Cc: Ori Kam <orika@nvidia.com>, Thomas Monjalon <thomas@monjalon.net>, 
 Ferruh Yigit <ferruh.yigit@intel.com>,
 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, 
 dpdk-dev <dev@dpdk.org>, Xiaoyun Li <xiaoyun.li@intel.com>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-256; boundary="0000000000002b1c8d05c01b2cb3"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [dpdk-dev] [PATCH v3 1/3] ethdev: introduce conntrack flow
 action and item
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

--0000000000002b1c8d05c01b2cb3
Content-Type: text/plain; charset="UTF-8"

On Fri, Apr 16, 2021 at 10:54 AM Bing Zhao <bingz@nvidia.com> wrote:
>
> This commit introduces the conntrack action and item.
>
> Usually the HW offloading is stateless. For some stateful offloading
> like a TCP connection, HW module will help provide the ability of a
> full offloading w/o SW participation after the connection was
> established.
>
> The basic usage is that in the first flow rule the application should
> add the conntrack action and jump to the next flow table. In the
> following flow rule(s) of the next table, the application should use
> the conntrack item to match on the result.
>
> A TCP connection has two directions traffic. To set a conntrack

s/has two directions traffic/can have traffic in two directions.

> action context correctly, the information of packets from both
> directions are required.
>
> The conntrack action should be created on one ethdev port and supply
> the peer ethdev port as a parameter to the action. After context
> created, it could only be used between these two ethdev ports
> (dual-port mode) or a single port. The application should modify the
> action via the API "rte_action_handle_update" only when before using
> it to create a flow rule with conntrack conntrack for the opposite
> direction. This will help the driver to recognize the direction of
> the flow to be created, especially in the single-port mode, in which
> case the traffic from both directions will go through the same
> ethdev port if the application works as an "forwarding engine" but
> not an end point. There is no need to call the update interface if
> the subsequent flow rules have nothing to be changed.
>
> Query will be supported via "rte_action_handle_query" interface,
> about the current packets information and connection status. The
> fields query capabilities depends on the HW.
How about this:
The fields which can be queried will depend on the HW capabilities.

>
> For the packets received during the conntrack setup, it is suggested
> to re-inject the packets in order to make sure the conntrack module
> works correctly without missing any packet. Only the valid packets
> should pass the conntrack, packets with invalid TCP information,
> like out of window, or with invalid header, like malformed, should
> not pass.
>
> Naming and definition:
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/nf_conntrack_tcp.h
> https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_tcp.c
>
> Other reference:
> https://www.usenix.org/legacy/events/sec01/invitedtalks/rooij.pdf
>
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> ---
>  lib/librte_ethdev/rte_flow.c |   2 +
>  lib/librte_ethdev/rte_flow.h | 207 +++++++++++++++++++++++++++++++++++
>  2 files changed, 209 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 0d2610b7c4..c7c7108933 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
>         MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
>         MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
>         MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct rte_flow_item_geneve_opt)),
> +       MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
>  };
>
>  /** Generate flow_action[] entry. */
> @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = {
>          * indirect action handle.
>          */
>         MK_FLOW_ACTION(INDIRECT, 0),
> +       MK_FLOW_ACTION(CONNTRACK, sizeof(struct rte_flow_action_conntrack)),
>  };
>
>  int
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 324d00abdc..c9d7bdfa57 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -551,6 +551,15 @@ enum rte_flow_item_type {
>          * See struct rte_flow_item_geneve_opt
>          */
>         RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> +
> +       /**
> +        * [META]
> +        *
> +        * Matches conntrack state.
> +        *
> +        * @see struct rte_flow_item_conntrack.
> +        */
> +       RTE_FLOW_ITEM_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = {
>  };
>  #endif
>
> +/**
> + * The packet is valid after conntrack checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_VALID RTE_BIT32(0)
> +/**
> + * The state of the connection is changed.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED RTE_BIT32(1)
> +/**
> + * Error is detected on this packet for this connection and
> + * an invalid state is set.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_INVALID RTE_BIT32(2)
> +/**
> + * The HW connection tracking module is disabled.
> + * It can be due to application command or an invalid state.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED RTE_BIT32(3)
> +/**
> + * The packet contains some bad field(s) and cannot continue
> + * with the conntrack module checking.
> + */
> +#define RTE_FLOW_CONNTRACK_PKT_STATE_BAD RTE_BIT32(4)
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> + *
> + * Matches the state of a packet after it passed the connection tracking
> + * examination. The state is a bitmap of one RTE_FLOW_CONNTRACK_PKT_STATE*
> + * or a reasonable combination of these bits.
> + */
> +struct rte_flow_item_conntrack {
> +       uint32_t flags;
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_conntrack rte_flow_item_conntrack_mask = {
> +       .flags = 0xffffffff,
> +};
> +#endif
> +
>  /**
>   * Matching pattern item definition.
>   *
> @@ -2277,6 +2331,15 @@ enum rte_flow_action_type {
>          * same port or across different ports.
>          */
>         RTE_FLOW_ACTION_TYPE_INDIRECT,
> +
> +       /**
> +        * [META]
> +        *
> +        * Enable tracking a TCP connection state.
> +        *
> +        * @see struct rte_flow_action_conntrack.
> +        */
> +       RTE_FLOW_ACTION_TYPE_CONNTRACK,
>  };
>
>  /**
> @@ -2875,6 +2938,150 @@ struct rte_flow_action_set_dscp {
>   */
>  struct rte_flow_action_handle;
>
> +/**
> + * The state of a TCP connection.
> + */
> +enum rte_flow_conntrack_state {
> +       /**< SYN-ACK packet was seen. */
> +       RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> +       /**< 3-way handshake was done. */
> +       RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> +       /**< First FIN packet was received to close the connection. */
> +       RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> +       /**< First FIN was ACKed. */
> +       RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> +       /**< Second FIN was received, waiting for the last ACK. */
> +       RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> +       /**< Second FIN was ACKed, connection was closed. */
> +       RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> +};
> +
> +/**
> + * The last passed TCP packet flags of a connection.
> + */
> +enum rte_flow_conntrack_tcp_last_index {
> +       RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_SYN = RTE_BIT32(0), /**< With SYN flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_SYNACK = RTE_BIT32(1), /**< With SYNACK flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_FIN = RTE_BIT32(2), /**< With FIN flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_ACK = RTE_BIT32(3), /**< With ACK flag. */
> +       RTE_FLOW_CONNTRACK_FLAG_RST = RTE_BIT32(4), /**< With RST flag. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Configuration parameters for each direction of a TCP connection.
> + */
> +struct rte_flow_tcp_dir_param {
> +       /** TCP window scaling factor, 0xF to disable. */
> +       uint32_t scale:4;
> +       /** The FIN was sent by this direction. */
> +       uint32_t close_initiated:1;
> +       /** An ACK packet has been received by this side. */
> +       uint32_t last_ack_seen:1;
> +       /**
> +        * If set, it indicates that there is unacknowledged data for the
> +        * packets sent from this direction.
> +        */
> +       uint32_t data_unacked:1;
> +       /**
> +        * Maximal value of sequence + payload length in sent
> +        * packets (next ACK from the opposite direction).
> +        */
> +       uint32_t sent_end;
> +       /**
> +        * Maximal value of (ACK + window size) in received packet + length
> +        * over sent packet (maximal sequence could be sent).
> +        */
> +       uint32_t reply_end;
> +       /** Maximal value of actual window size in sent packets. */
> +       uint32_t max_win;
> +       /** Maximal value of ACK in sent packets. */
> +       uint32_t max_ack;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Configuration and initial state for the connection tracking module.
> + * This structure could be used for both setting and query.
> + */
> +struct rte_flow_action_conntrack {
> +       /** The peer port number, can be the same port. */
> +       uint16_t peer_port;
> +       /**
> +        * Direction of this connection when creating a flow, the value
> +        * only affects the subsequent flows creation.

s/flows/flow
or
s/the subsequent flows creation/the creation of subsequent flows


> +        */
> +       uint32_t is_original_dir:1;
> +       /**
> +        * Enable / disable the conntrack HW module. When disabled, the
> +        * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> +        * In this state the HW will act as passthrough.
> +        * It only affects this conntrack object in the HW without any effect
> +        * to the other objects.
> +        */
> +       uint32_t enable:1;
> +       /** At least one ack was seen after the connection was established. */
> +       uint32_t live_connection:1;
> +       /** Enable selective ACK on this connection. */
> +       uint32_t selective_ack:1;
> +       /** A challenge ack has passed. */
> +       uint32_t challenge_ack_passed:1;
> +       /**
> +        * 1: The last packet is seen from the original direction.
> +        * 0: The last packet is seen from the reply direction.
> +        */
> +       uint32_t last_direction:1;
> +       /** No TCP check will be done except the state change. */
> +       uint32_t liberal_mode:1;
> +       /**<The current state of this connection. */
> +       enum rte_flow_conntrack_state state;
> +       /** Scaling factor for maximal allowed ACK window. */
> +       uint8_t max_ack_window;
> +       /** Maximal allowed number of retransmission times. */
s/times/limit

> +       uint8_t retransmission_limit;
> +       /** TCP parameters of the original direction. */
> +       struct rte_flow_tcp_dir_param original_dir;
> +       /** TCP parameters of the reply direction. */
> +       struct rte_flow_tcp_dir_param reply_dir;
> +       /** The window value of the last packet passed this conntrack. */
s/value/size

> +       uint16_t last_window;
> +       enum rte_flow_conntrack_tcp_last_index last_index;
> +       /** The sequence of the last packet passed this conntrack. */
sequence number of the ...

> +       uint32_t last_seq;
> +       /** The acknowledgement of the last packet passed this conntrack. */
ACK number of the..
s/passed this/passed by this
or
passing this

> +       uint32_t last_ack;
> +       /**
> +        * The total value ACK + payload length of the last packet
> +        * passed this conntrack.
s/passed this/passed by this
or passing this

> +        */
> +       uint32_t last_end;
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> + *
> + * Wrapper structure for the context update interface.
> + * Ports cannot support updating, and the only valid solution is to
> + * destroy the old context and create a new one instead.
> + */
> +struct rte_flow_modify_conntrack {
> +       /** New connection tracking parameters to be updated. */
> +       struct rte_flow_action_conntrack new_ct;
> +       /** The direction field will be updated. */
> +       uint32_t direction:1;
> +       /** All the other fields except direction will be updated. */
> +       uint32_t state:1;
> +       /** Reserved bits for the future usage. */
> +       uint32_t reserved:30;
> +};
> +
>  /**
>   * Field IDs for MODIFY_FIELD action.
>   */
> --
> 2.19.0.windows.1
>

--0000000000002b1c8d05c01b2cb3--