From: Bing Zhao <bingz@nvidia.com>
To: Ori Kam <orika@nvidia.com>,
NBU-Contact-Thomas Monjalon <thomas@monjalon.net>,
"ferruh.yigit@intel.com" <ferruh.yigit@intel.com>,
"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"ajit.khaparde@broadcom.com" <ajit.khaparde@broadcom.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: introduce conntrack flow action and item
Date: Fri, 16 Apr 2021 18:05:16 +0000 [thread overview]
Message-ID: <MN2PR12MB29099D1C03DAF01257AB8F41D04C9@MN2PR12MB2909.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DM6PR12MB49876DC49C2EB140BD870F39D64C9@DM6PR12MB4987.namprd12.prod.outlook.com>
Hi Ori,
My comments are inline, PSB.
> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Friday, April 16, 2021 8:42 PM
> To: Bing Zhao <bingz@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; ajit.khaparde@broadcom.com
> Subject: RE: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> and item
>
> Hi Bing,
>
> One more thought, PSB
>
> Best,
> Ori
> > -----Original Message-----
> > From: Bing Zhao <bingz@nvidia.com>
> > Sent: Thursday, April 15, 2021 7:41 PM
> > To: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; ferruh.yigit@intel.com;
> > andrew.rybchenko@oktetlabs.ru
> > Cc: dev@dpdk.org; ajit.khaparde@broadcom.com
> > Subject: [PATCH v2 1/2] ethdev: introduce conntrack flow action
> and
> > item
> >
> > This commit introduced the conntrack action and item.
> >
> > Usually the HW offloading is stateless. For some stateful
> offloading
> > like a TCP connection, HW module will help provide the ability of
> a
> > full offloading w/o SW participation after the connection was
> > established.
> >
> > The basic usage is that in the first flow the application should
> add
> > the conntrack action and in the following flow(s) the application
> > should use the conntrack item to match on the result.
> >
> > A TCP connection has two directions traffic. To set a conntrack
> action
> > context correctly, information from packets of both directions are
> > required.
> >
> > The conntrack action should be created on one port and supply the
> peer
> > port as a parameter to the action. After context creating, it
> could
> > only be used between the ports (dual-port mode) or a single port.
> The
> > application should modify the action via the API
> > "action_handle_update" only when before using it to create a flow
> with
> > opposite direction. This will help the driver to recognize the
> > direction of the flow to be created, especially in single port
> mode.
> > The traffic from both directions will go through the same port if
> the
> > application works as an "forwarding engine" but not a end point.
> > There is no need to call the update interface if the subsequent
> flows
> > have nothing to be changed.
> >
> > Query will be supported via action_ctx_query interface, about the
> > current packets information and connection status. Tha fields
> query
> > capabilities depends on the HW.
> >
> > For the packets received during the conntrack setup, it is
> suggested
> > to re-inject the packets in order to take full advantage of the
> > conntrack. Only the valid packets should pass the conntrack,
> packets
> > with invalid TCP information, like out of window, or with invalid
> > header, like malformed, should not pass.
> >
> > Naming and definition:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Finclude%2Fuapi%2Flinux%2F
> ne
> >
> tfilter%2Fnf_co&data=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc9
> 44
> >
> b0127508d900d4f89a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6375
> 41
> >
> 736960852707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l
> uM
> >
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2Bbsd48yNzMbhUyp
> In
> > kol%2B7LskVzx1WHj%2Fkd%2Fu0zks0A%3D&reserved=0
> > nntrack_tcp.h
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> ix
> >
> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fnet%2Fnetfilter%2Fnf_conn
> tr
> >
> ack_proto_&data=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc944b01
> 27
> >
> 508d900d4f89a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637541736
> 96
> >
> 0852707%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LC
> >
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CwKk%2FgQWxRY22%2BAaCF
> OP
> > 1TbGphcqURrBFSf4NupMPPA%3D&reserved=0
> > tcp.c
> >
> > Other reference:
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.
> >
> usenix.org%2Flegacy%2Fevents%2Fsec01%2Finvitedtalks%2Frooij.pdf&
> da
> >
> ta=04%7C01%7Cbingz%40nvidia.com%7C29da48bebdc944b0127508d900d4f89a%7
> C4
> >
> 3083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637541736960852707%7CUnkno
> wn
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWw
> iLCJ
> >
> XVCI6Mn0%3D%7C1000&sdata=geKUBifelEBuFzQviu%2FPeV19DOjWzZAbdAlo%
> 2B
> > cVX%2FXs%3D&reserved=0
> >
> > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> > ---
> > lib/librte_ethdev/rte_flow.c | 2 +
> > lib/librte_ethdev/rte_flow.h | 195
> > +++++++++++++++++++++++++++++++++++
> > 2 files changed, 197 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index 27a161559d..0af601d508 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -98,6 +98,7 @@ static const struct rte_flow_desc_data
> > rte_flow_desc_item[] = {
> > MK_FLOW_ITEM(PFCP, sizeof(struct rte_flow_item_pfcp)),
> > MK_FLOW_ITEM(ECPRI, sizeof(struct rte_flow_item_ecpri)),
> > MK_FLOW_ITEM(GENEVE_OPT, sizeof(struct
> rte_flow_item_geneve_opt)),
> > + MK_FLOW_ITEM(CONNTRACK, sizeof(uint32_t)),
> > };
> >
> > /** Generate flow_action[] entry. */
> > @@ -186,6 +187,7 @@ static const struct rte_flow_desc_data
> > rte_flow_desc_action[] = {
> > * indirect action handle.
> > */
> > MK_FLOW_ACTION(INDIRECT, 0),
> > + MK_FLOW_ACTION(CONNTRACK, sizeof(struct
> > rte_flow_action_conntrack)),
> > };
> >
> > int
> > diff --git a/lib/librte_ethdev/rte_flow.h
> > b/lib/librte_ethdev/rte_flow.h index 91ae25b1da..024d1a2026 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -551,6 +551,15 @@ enum rte_flow_item_type {
> > * See struct rte_flow_item_geneve_opt
> > */
> > RTE_FLOW_ITEM_TYPE_GENEVE_OPT,
> > +
> > + /**
> > + * [META]
> > + *
> > + * Matches conntrack state.
> > + *
> > + * See struct rte_flow_item_conntrack.
> > + */
> > + RTE_FLOW_ITEM_TYPE_CONNTRACK,
> > };
> >
> > /**
> > @@ -1685,6 +1694,51 @@ rte_flow_item_geneve_opt_mask = { };
> #endif
> >
> > +/**
> > + * The packet is with valid state after conntrack checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_VALID (1 << 0)
> > +/**
> > + * The state of the connection was changed.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_CHANGED (1 << 1)
> > +/**
> > + * Error is detected on this packet for this connection and
> > + * an invalid state is set.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_STATE_INVAL (1 << 2)
> > +/**
> > + * The HW connection tracking module is disabled.
> > + * It can be due to application command or an invalid state.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_HW_DISABLED (1 << 3)
> > +/**
> > + * The packet contains some bad field(s) and cannot continue
> > + * with the conntrack module checking.
> > + */
> > +#define RTE_FLOW_CONNTRACK_FLAG_PKT_BAD (1 << 4)
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ITEM_TYPE_CONNTRACK
> > + *
> > + * Matches the state of a packet after it passed the connection
> > +tracking
> > + * examination. The state is a bit mask of one
> > +RTE_FLOW_CONNTRACK_FLAG*
> > + * or a reasonable combination of these bits.
> > + */
> > +struct rte_flow_item_conntrack {
> > + uint32_t flags;
> > +};
> > +
> > +/** Default mask for RTE_FLOW_ITEM_TYPE_CONNTRACK. */ #ifndef
> > +__cplusplus static const struct rte_flow_item_conntrack
> > +rte_flow_item_conntrack_mask =
> > {
> > + .flags = 0xffffffff,
> > +};
> > +#endif
> > +
> > /**
> > * Matching pattern item definition.
> > *
> > @@ -2277,6 +2331,17 @@ enum rte_flow_action_type {
> > * same port or across different ports.
> > */
> > RTE_FLOW_ACTION_TYPE_INDIRECT,
> > +
> > + /**
> > + * [META]
> > + *
> > + * Enable tracking a TCP connection state.
> > + *
> > + * Send packet to HW connection tracking module for
> examination.
> > + *
> > + * See struct rte_flow_action_conntrack.
> > + */
> > + RTE_FLOW_ACTION_TYPE_CONNTRACK,
> > };
> >
> > /**
> > @@ -2875,6 +2940,136 @@ struct rte_flow_action_set_dscp {
> > */
> > struct rte_flow_action_handle;
> >
> > +/**
> > + * The state of a TCP connection.
> > + */
> > +enum rte_flow_conntrack_state {
> > + /**< SYN-ACK packet was seen. */
> > + RTE_FLOW_CONNTRACK_STATE_SYN_RECV,
> > + /**< 3-way handshark was done. */
> > + RTE_FLOW_CONNTRACK_STATE_ESTABLISHED,
> > + /**< First FIN packet was received to close the connection. */
> > + RTE_FLOW_CONNTRACK_STATE_FIN_WAIT,
> > + /**< First FIN was ACKed. */
> > + RTE_FLOW_CONNTRACK_STATE_CLOSE_WAIT,
> > + /**< Second FIN was received, waiting for the last ACK. */
> > + RTE_FLOW_CONNTRACK_STATE_LAST_ACK,
> > + /**< Second FIN was ACKed, connection was closed. */
> > + RTE_FLOW_CONNTRACK_STATE_TIME_WAIT,
> > +};
> > +
> > +/**
> > + * The last passed TCP packet flags of a connection.
> > + */
> > +enum rte_flow_conntrack_tcp_last_index {
> > + RTE_FLOW_CONNTRACK_FLAG_NONE = 0, /**< No Flag. */
> > + RTE_FLOW_CONNTRACK_FLAG_SYN = (1 << 0), /**< With SYN flag. */
> > + RTE_FLOW_CONNTRACK_FLAG_SYNACK = (1 << 1), /**< With SYN+ACK
> > flag. */
> > + RTE_FLOW_CONNTRACK_FLAG_FIN = (1 << 2), /**< With FIN flag. */
> > + RTE_FLOW_CONNTRACK_FLAG_ACK = (1 << 3), /**< With ACK flag. */
> > + RTE_FLOW_CONNTRACK_FLAG_RST = (1 << 4), /**< With RST flag.
> */ };
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * Configuration parameters for each direction of a TCP
> connection.
> > + */
> > +struct rte_flow_tcp_dir_param {
> > + uint32_t scale:4; /**< TCP window scaling factor, 0xF to
> disable. */
> > + uint32_t close_initiated:1; /**< The FIN was sent by this
> direction. */
> > + /**< An ACK packet has been received by this side. */
> > + uint32_t last_ack_seen:1;
> > + /**< If set, indicates that there is unacked data of the
> connection. */
> > + uint32_t data_unacked:1;
> > + /**< Maximal value of sequence + payload length over sent
> > + * packets (next ACK from the opposite direction).
> > + */
> > + uint32_t sent_end;
> > + /**< Maximal value of (ACK + window size) over received packet
> +
> > length
> > + * over sent packet (maximal sequence could be sent).
> > + */
> > + uint32_t reply_end;
>
> This comment is for all members that are part of the packet, Do you
> think it should be in network order?
Almost none of the fields are part of the packet. Indeed, most of them are calculated from the packets information. So I prefer to keep the host order easy for using and
keep all the fields of the whole structure the same endianness format.
What do you think?
> I can see the advantage in both ways nice I assume the app needs
> this data in host byte-order but since in most other cases we use
> network byte-order to set values that are coming from the packet
> itself maybe it is better to use network byte-order (will also save
> the conversion)
Only the seq/ack/window in the common part are part of the packets, others are not.
BTW, should we support liberal mode separately for both direction as some "half-duplex". One direction could work normally and the opposite direct will work in the liberal mode?
>
> > + /**< Maximal value of actual window size over sent packets. */
> > + uint32_t max_win;
> > + /**< Maximal value of ACK over sent packets. */
> > + uint32_t max_ack;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior
> notice
> > + *
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Configuration and initial state for the connection tracking
> module.
> > + * This structure could be used for both setting and query.
> > + */
> > +struct rte_flow_action_conntrack {
> > + uint16_t peer_port; /**< The peer port number, can be the same
> port.
> > */
> > + /**< Direction of this connection when creating a flow, the
> value only
> > + * affects the subsequent flows creation.
> > + */
> > + uint32_t is_original_dir:1;
> > + /**< Enable / disable the conntrack HW module. When disabled,
> the
> > + * result will always be RTE_FLOW_CONNTRACK_FLAG_DISABLED.
> > + * In this state the HW will act as passthrough.
> > + * It only affects this conntrack object in the HW without any
> effect
> > + * to the other objects.
> > + */
> > + uint32_t enable:1;
> > + /**< At least one ack was seen, after the connection was
> established.
> > */
> > + uint32_t live_connection:1;
> > + /**< Enable selective ACK on this connection. */
> > + uint32_t selective_ack:1;
> > + /**< A challenge ack has passed. */
> > + uint32_t challenge_ack_passed:1;
> > + /**< 1: The last packet is seen that comes from the original
> direction.
> > + * 0: From the reply direction.
> > + */
> > + uint32_t last_direction:1;
> > + /**< No TCP check will be done except the state change. */
> > + uint32_t liberal_mode:1;
> > + /**< The current state of the connection. */
> > + enum rte_flow_conntrack_state state;
> > + /**< Scaling factor for maximal allowed ACK window. */
> > + uint8_t max_ack_window;
> > + /**< Maximal allowed number of retransmission times. */
> > + uint8_t retransmission_limit;
> > + /**< TCP parameters of the original direction. */
> > + struct rte_flow_tcp_dir_param original_dir;
> > + /**< TCP parameters of the reply direction. */
> > + struct rte_flow_tcp_dir_param reply_dir;
> > + /**< The window value of the last packet passed this conntrack.
> */
> > + uint16_t last_window;
> > + enum rte_flow_conntrack_tcp_last_index last_index;
> > + /**< The sequence of the last packet passed this conntrack. */
> > + uint32_t last_seq;
> > + /**< The acknowledgement of the last packet passed this
> conntrack. */
> > + uint32_t last_ack;
> > + /**< The total value ACK + payload length of the last packet
> passed
> > + * this conntrack.
> > + */
> > + uint32_t last_end;
> > +};
> > +
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_CONNTRACK
> > + *
> > + * Wrapper structure for the context update interface.
> > + * Ports cannot support updating, and the only valid solution is
> to
> > + * destroy the old context and create a new one instead.
> > + */
> > +struct rte_flow_modify_conntrack {
> > + /**< New connection tracking parameters to be updated. */
> > + struct rte_flow_action_conntrack new_ct;
> > + uint32_t direction:1; /**< The direction field will be updated.
> */
> > + /**< All the other fields except direction will be updated. */
> > + uint32_t state:1;
> > + uint32_t reserved:30; /**< Reserved bits for the future usage.
> */ };
> > +
> > /**
> > * Field IDs for MODIFY_FIELD action.
> > */
> > --
> > 2.19.0.windows.1
BR. Bing
next prev parent reply other threads:[~2021-04-16 18:05 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-18 7:30 [dpdk-dev] [RFC] " Bing Zhao
2021-03-22 15:16 ` Andrew Rybchenko
2021-04-07 7:43 ` Bing Zhao
2021-03-23 23:27 ` Ajit Khaparde
2021-04-07 2:41 ` Bing Zhao
2021-04-10 13:46 ` [dpdk-dev] [PATCH] " Bing Zhao
2021-04-15 16:24 ` Ori Kam
2021-04-15 16:44 ` Bing Zhao
2021-04-15 16:41 ` [dpdk-dev] [PATCH v2 0/2] " Bing Zhao
2021-04-15 16:41 ` [dpdk-dev] [PATCH v2 1/2] " Bing Zhao
2021-04-16 10:49 ` Thomas Monjalon
2021-04-16 18:18 ` Bing Zhao
2021-04-16 12:41 ` Ori Kam
2021-04-16 18:05 ` Bing Zhao [this message]
2021-04-16 21:47 ` Ajit Khaparde
2021-04-17 6:10 ` Bing Zhao
2021-04-17 14:54 ` Ajit Khaparde
2021-04-15 16:41 ` [dpdk-dev] [PATCH v2 2/2] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-16 8:46 ` Ori Kam
2021-04-16 18:20 ` Bing Zhao
2021-04-16 17:54 ` [dpdk-dev] [PATCH v3 0/3] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-16 17:54 ` [dpdk-dev] [PATCH v3 1/3] " Bing Zhao
2021-04-16 18:30 ` Ajit Khaparde
2021-04-19 14:08 ` Thomas Monjalon
2021-04-19 16:21 ` Bing Zhao
2021-04-19 14:06 ` Thomas Monjalon
2021-04-19 16:13 ` Bing Zhao
2021-04-16 17:54 ` [dpdk-dev] [PATCH v3 2/3] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-16 17:54 ` [dpdk-dev] [PATCH v3 3/3] doc: update " Bing Zhao
2021-04-16 18:22 ` Thomas Monjalon
2021-04-16 18:30 ` Ajit Khaparde
2021-04-19 17:28 ` Bing Zhao
2021-04-19 17:16 ` [dpdk-dev] [PATCH v4 0/3] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-19 17:16 ` [dpdk-dev] [PATCH v4 1/3] " Bing Zhao
2021-04-19 17:33 ` Ori Kam
2021-04-19 17:16 ` [dpdk-dev] [PATCH v4 2/3] app/testpmd: add CLI for conntrack Bing Zhao
2021-04-19 17:35 ` Ori Kam
2021-04-19 17:16 ` [dpdk-dev] [PATCH v4 3/3] doc: update " Bing Zhao
2021-04-19 17:32 ` Thomas Monjalon
2021-04-19 17:37 ` Ori Kam
2021-04-19 17:51 ` [dpdk-dev] [PATCH v5 0/2] ethdev: introduce conntrack flow action and item Bing Zhao
2021-04-19 17:51 ` [dpdk-dev] [PATCH v5 1/2] " Bing Zhao
2021-04-19 18:07 ` Thomas Monjalon
2021-04-19 23:29 ` Ferruh Yigit
2021-04-19 17:51 ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add CLI for conntrack Bing Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MN2PR12MB29099D1C03DAF01257AB8F41D04C9@MN2PR12MB2909.namprd12.prod.outlook.com \
--to=bingz@nvidia.com \
--cc=ajit.khaparde@broadcom.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=orika@nvidia.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).