DPDK patches and discussions
 help / color / mirror / Atom feed
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
To: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>
Cc: <ivan.malov@arknetworks.am>, <viacheslavo@nvidia.com>,
	<bingz@nvidia.com>,  <orika@nvidia.com>, <suanmingm@nvidia.com>,
	<matan@nvidia.com>, <dev@dpdk.org>, <stable@dpdk.org>
Subject: Re: [PATCH] net/mlx5: fix connection tracking state item validation
Date: Mon, 11 Aug 2025 17:15:20 +0200	[thread overview]
Message-ID: <20250811151520.bonpjpefwuzuap65@ds-vm-debian.local> (raw)
In-Reply-To: <20250811062149.2489151-1-14pwcse1224@uetpeshawar.edu.pk>

On Mon, Aug 11, 2025 at 02:21:49AM -0400, Khadem Ullah wrote:
> Hi Dariusz Sosnowski, 
> 
> According to documentation, conntrack item matches a conntrack 
> state after conntrack action. Your statement is also correct 
> "match valid TCP packets which change TCP connection state", 
> it means in this case also we are matching TCP connection state. 
> 
> Please check CONNTRACK item in Generic flow API (rte_flow)
> 16.2.6.47. Item: CONNTRACK
> 
> Matches a conntrack state after conntrack action.
> 
>     flags: conntrack packet state flags.
>     Default mask matches all state bits.
> 
> https://doc.dpdk.org/guides-24.07/prog_guide/rte_flow.html

As mentioned in the quoted docs - flags in conntrack item match
"conntrack packet state flags", not "connection state".
Packet state refers to one or more RTE_FLOW_CONNTRACK_PKT_STATE_* flags.
(Documented specifically in API docs).

I am focusing on highlighting the difference between "packet state"
and "connection state", since this distinction is important, because:

- flow item deals with "packet state"
- flow action deals with "connection state"

One of the flags for "packet state" is RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED,
which indicates that this packet has changed TCP connection state.
But without analyzing the packet and inspecting TCP connection state
(by querying the conntrack flow action),
user does not know the specific state transition i.e.,
whether transition was SYN_RECV -> ESTABLISHED, or ESTABLISHED -> FIN_WAIT for example.
So user can match packets which changed the connection state,
but not the specific state transition.

Perhaps the docs (both programming guide and API docs) could be improved
in that regard to highlight the difference more clearly.

> 
> E.g. I have performed the following experiemtns on connect-x6-Dx for clarification
> (provding only the relevent information).
> 
> conntract state can be verified for liberal mode.
> In liberal mode, the Seq/ACK/Win check will be ignored (forget about act/seq)
> and only the state change will be tracked.

Correct, but even with liberal mode (so no TCP window check)
"packet state" after conntrack execution is still at least
RTE_FLOW_CONNTRACK_PKT_STATE_VALID.
This is the area of the docs which should be improved, since it is not
stated.

This makes the following example matches correct
(take note that flow item takes a bitmap of RTE_FLOW_CONNTRACK_PKT_STATE_*):

- conntrack is 1 => RTE_FLOW_CONNTRACK_PKT_STATE_VALID

- conntrack is 2 => RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED

- conntrack is 3 => RTE_FLOW_CONNTRACK_PKT_STATE_VALID **and**
                    RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED

> 
> Test 1 : Starting state machine from State 0  
> 
> flow indirect_action 0 create ingress action conntrack / end
> flow create 0 ingress pattern eth / ipv4 / tcp / end actions jump group 3 / end
> flow create 0 group 3 ingress pattern eth / ipv4 / tcp / end actions indirect 0 / jump group 5 / end
> flow create 0 group 5 ingress pattern eth / ipv4 / tcp / conntrack is 1 / end actions queue index 1 / end
> flow create 0 group 5 ingress pattern eth / ipv4 / tcp / conntrack is 2 / end actions queue index 2 / end
> set fwd rxonly 
> start 
> set verbose 3 
> 
> The following packets will be forwared to queue 1, it means the state machine is now in established state (state 1). 
> sendp(Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x10), iface="",count=1)  
> sendp(Ether(dst="bb:cc:dd:ee:ff:22",src="aa:bb:cc:dd:ee:ff")/IP(src="150.1.10.10")/TCP(ack=265,seq=265,dport=5555,flags=0x18), iface="",count=1)
> 
> FIN packet Terminate the connection; the following packets will be forwarded to queue 2:
> pkt=Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x01)
> pkt=Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x10)
> pkt=Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x01)
> pkt=Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x10)
> 
> This will be again forwarded it to queue 1: 
> pkt=Ether()/IP()/TCP(ack=265,seq=265,dport=5555,flags=0x10)

I'm not sure how this result was reached, because when using these commands
and sending these packets, I do not receive any.
I only receive the packets if and only if the rule with conntrack item
matches on RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED:

	flow create 0 group 5 ingress pattern eth / ipv4 / tcp / conntrack is 8 / end actions queue index 0 / end

Which is expected with the given configuration.

Are these the only testpmd commands you execute?

If yes, then there are a few things missing for functional conntrack offload.
Let me explain:

1. There is no conntrack action configuration.

   testpmd has a few commands which allow users to provide initial
   conntrack action state. Please see https://doc.dpdk.org/guides/testpmd_app_ug/testpmd_funcs.html#sample-conntrack-rules

   Without running these commands, initial conntrack state is zeroed.
   As a result, `enable` field in `rte_flow_action_conntrack` is zero
   and TCP state machine in HW is not updated for each passing packet,
   and each packet is marked with RTE_FLOW_CONNTRACK_PKT_STATE_DISABLED.

2. There is only one rule with conntrack action.

   Rules with conntrack action need to know which TCP connection direction
   would pass through that rule. This is needed because, conntrack
   offload does not track IP addresses and TCP ports.
   Inside the state machine, there are 2 separate sets of seq/ack
   numbers tracked for each direction.

   Users must ensure that there will be 2 rules (1 for each TCP
   direction) with conntrack actions.

3. conntrack item deals with RTE_FLOW_CONNTRACK_PKT_STATE_* bitmap

   In your example, "conntrack is 1" specification sets flags to 1.
   This means, "match packets with RTE_FLOW_CONNTRACK_PKT_STATE_VALID"
   and not "connection in RTE_FLOW_CONNTRACK_STATE_ESTABLISHED".

   The same goes for "conntrack is 2". It specifies match on
   RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED, not on
   RTE_FLOW_CONNTRACK_STATE_FIN_WAIT or any other state.

   Because it is a bitmap, conntrack item can specify a combination of
   PKT_STATE flags. For example, "conntrack is 3" would mean matching
   a packet with RTE_FLOW_CONNTRACK_PKT_STATE_VALID and
   RTE_FLOW_CONNTRACK_PKT_STATE_CHANGED flags set.

Please see [1] for a full example. Let us know if you have any questions
about the example and if it works for you.

> 
> So, according to my understanding(from rte_flow and various experiments), 
> conntrack item ('conntract is') matches the state of the connection tracking 
> state machine in hardware 
> and forward it to the relevent queue. 
> 
> In any case, I think only a range of values for "conntract is" to be allowed. 
> 
> Best Regards, 
> Khadem

[1]: Full conntrack example, testpmd commands:

# Initial conntrack action configuration: original direction, state SYN_RECV, liberal mode and enabled
set conntrack com peer 0 is_orig 1 enable 1 live 0 sack 0 cack 0 last_dir 0 liberal 1 state 0 max_ack_win 0 r_lim 5 last_win 510 last_seq 2000 last_ack 101 last_end 101 last_index 0x2
set conntrack orig scale 0xf fin 0 acked 1 unack_data 0 sent_end 101 reply_end 65535 max_win 0 max_ack 0
set conntrack rply scale 0xf fin 0 acked 1 unack_data 0 sent_end 2001 reply_end 65535 max_win 0 max_ack 101
flow indirect_action 0 create ingress action conntrack / end

# Create a rule for original direction
flow create 0 group 3 ingress pattern eth / ipv4 src is 1.2.3.4 dst is 5.6.7.8 / tcp src is 40000 dst is 50000 / end actions indirect 0 / jump group 5 / end

# Update conntrack action - now rule will created for reply direction
set conntrack com peer 0 is_orig 0 enable 1 live 0 sack 0 cack 0 last_dir 0 liberal 1 state 0 max_ack_win 0 r_lim 5 last_win 510 last_seq 2000 last_ack 101 last_end 101 last_index 0x2
flow indirect_action 0 update 0 action conntrack_update dir / end

# Create a rule for reply direction
flow create 0 group 3 ingress pattern eth / ipv4 src is 5.6.7.8 dst is 1.2.3.4 / tcp src is 50000 dst is 40000 / end actions indirect 0 / jump group 5 / end

# Create group 0 rule for TCP traffic
flow create 0 ingress pattern eth / ipv4 / tcp / end actions jump group 3 / end

# Match valid packets, mark and send to queue 0
flow create 0 group 5 ingress pattern eth / ipv4 / tcp / conntrack is 1 / end actions mark id 0x111 / queue index 0 / end
# Match valid packets which change connection state
flow create 0 group 5 ingress pattern eth / ipv4 / tcp / conntrack is 3 / end actions mark id 0x333 / queue index 0 / end

set verbose 1
set fwd rxonly
start

Example packets to send after all flow rules are created:

# ACK in handshake: transition SYN_RECV->ESTABLISHED; logged as "FDIR matched ID=0x333"
pkt = (Ether() / IP(src='1.2.3.4', dst='5.6.7.8') / TCP(sport=40000, dport=50000, flags='A', seq=101, ack=2001))

# some data from original direction; logged as "FDIR matched ID=0x111"
pkt = (Ether() / IP(src='1.2.3.4', dst='5.6.7.8') / TCP(sport=40000, dport=50000, flags='A', seq=101, ack=2001) / Raw(load=b'a' * 100))

# ack from reply direction; logged as "FDIR matched ID=0x111"
pkt = (Ether() / IP(src='5.6.7.8', dst='1.2.3.4') / TCP(sport=50000, dport=40000, flags='A', seq=2001, ack=201))

# fin from original direction; logged as "FDIR matched ID=0x333"
pkt = (Ether() / IP(src='1.2.3.4', dst='5.6.7.8') / TCP(sport=40000, dport=50000, flags='F', seq=201, ack=2001))

# ack from reply direction; logged as "FDIR matched ID=0x333"
pkt = (Ether() / IP(src='5.6.7.8', dst='1.2.3.4') / TCP(sport=50000, dport=40000, flags='A', seq=2001, ack=202))

# fin from reply direction; logged as "FDIR matched ID=0x333"
pkt = (Ether() / IP(src='5.6.7.8', dst='1.2.3.4') / TCP(sport=50000, dport=40000, flags='F', seq=2001, ack=202))

# ack from original direction; logged as "FDIR matched ID=0x333"
pkt = (Ether() / IP(src='1.2.3.4', dst='5.6.7.8') / TCP(sport=40000, dport=50000, flags='A', seq=201, ack=2002))

  reply	other threads:[~2025-08-11 15:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 13:23 Khadem Ullah
2025-08-05 14:44 ` Ivan Malov
2025-08-06  8:51   ` Khadem Ullah
2025-08-08  7:47     ` Dariusz Sosnowski
2025-08-11  6:21       ` Khadem Ullah
2025-08-11 15:15         ` Dariusz Sosnowski [this message]
2025-08-11 16:27           ` Khadem Ullah
2025-08-11 17:18             ` Dariusz Sosnowski
2025-08-12  9:51               ` Dariusz Sosnowski
2025-08-12 12:50                 ` Khadem Ullah
2025-08-12 12:46 ` [PATCH v2] " Khadem Ullah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250811151520.bonpjpefwuzuap65@ds-vm-debian.local \
    --to=dsosnowski@nvidia.com \
    --cc=14pwcse1224@uetpeshawar.edu.pk \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=ivan.malov@arknetworks.am \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).