Hi Bing,
Thanks for your help on this.

Let me check if I understand your analysis correctly.  With just 'eth' as the pattern the CX has to do two (maybe more) lookups, the first just match ethernet packets once that's done a second match has to occur to match IPv4 (since rss function is L3) , only then can the RSS action be performed.  When the pattern is eth/ipv4 only one lookup is required before the RSS action can occur?

thanks,
tony

On Mon, Jul 8, 2024 at 12:56 PM Bing Zhao <bingz@nvidia.com> wrote:
Hi,

Apologize for the late response. PSB

> -----Original Message-----
> From: Tony Hart <tony.hart@domainhart.com>
> Sent: Wednesday, June 26, 2024 9:25 PM
> To: Bing Zhao <bingz@nvidia.com>
> Cc: users@dpdk.org
> Subject: Re: Performance of CX7 with 'eth' pattern versus 'eth/ipv4' in
> hairpin
>
> External email: Use caution opening links or attachments
>
>
> Hi Bing,
> Thanks for the quick reply.  The results are...
>
> With a single hairpin queue I get approx the same rate for both patterns,
> ~54Gbps.  I assume this is less than the RSS rates due to fewer queues?
> flow create 0 ingress group 1 pattern eth / end actions count / queue
> index 6 / end flow create 0 ingress group 1 pattern eth / ipv4 / end
> actions count / queue index 6 / end

The reason that I want to compare single queue is to confirm if the difference is caused by the RSS action.
And the result is as expected.

>
> With the split ipv6/ipv4 I'm getting ~124Gbps
>
> flow create 0 ingress group 1 priority 1 pattern eth / ipv6 / end actions
> count / rss queues 6 7 8 9 end / end flow create 0 ingress group 1
> priority 1 pattern eth / ipv4 / end actions count / rss queues 6 7 8 9 end
> / end
>
> testpmd> flow list 0
> ID Group Prio Attr Rule
> 0 0 0 i-- => JUMP
> 1 1 1 i-- ETH IPV6 => COUNT RSS
> 2 1 1 i-- ETH IPV4 => COUNT RSS
>

I tried to debug on my local setup, the reason is related to the RSS expansion.
The mlx5 PMD doesn't support RSS on Ethernet header fields now. When only ETH is in the pattern, but the RSS is the default (L3 IP), there will be several rules to be inserted.
1. Ethernet + IPv6 / RSS based on IPv6 header
2. Ethernet + IPv4 / RSS based on IPv4 header
3. Other Ethernet packets / single default queue

This will have some more hops for a IPv4 packet.
So, it would be better to match IPv4 if you are using the default RSS fields.
Note: If you are using RSS on the (IP +) TCP / UDP fields, the expansion to the L4 headers may be involved. To get rid of this, the match of the rule can be specified to the L4 as well.

> On Wed, Jun 26, 2024 at 8:10 AM Bing Zhao <bingz@nvidia.com> wrote:
> >
> > Hi Tony,
> >
> > Could you also try to test with:
> > 1. QUEUE action instead of RSS and check 1 queue performance.
> > 2. when trying to test IPv4 only case, try the following 3 commands with
> this order -
> >         flow create 0 ingress group 0 pattern end actions jump group 1 /
> end
> >         flow create 0 ingress group 1 pattern priority 1 eth / ipv6 /
> end actions count / rss queues 6 7 8 9 end / end
> >         flow create 0 ingress group 1 pattern priority 1 eth / ipv4 /
> > end actions count / rss queues 6 7 8 9 end / end
> >
> > BR. Bing
> >
> > > -----Original Message-----
> > > From: Tony Hart <tony.hart@domainhart.com>
> > > Sent: Wednesday, June 26, 2024 7:39 PM
> > > To: users@dpdk.org
> > > Subject: Performance of CX7 with 'eth' pattern versus 'eth/ipv4' in
> > > hairpin
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > I'm using a CX7 and testing hairpin queues.  The test traffic is
> > > entirely
> > > IPv4+UDP with distributed SIP,DIP pairs and received packets are
> > > IPv4+u-turned via
> > > hairpin in the CX7 (single 400G interface).
> > >
> > > I see different performance when I use a pattern of 'eth' versus
> > > 'eth/ipv4' in the hairpin flow entry.  From testing it seems that
> > > specifying just 'eth' is sufficient to invoke RSS and 'eth/ipv4'
> > > should be equivalent since the traffic is all ipv4, but I'm getting
> > > ~104Gbps for the 'eth' pattern and  ~124Gbps for 'eth/ipv4' pattern.
> > >
> > > Any thoughts on why there is such a performance difference here?
> > >
> > > thanks
> > > tony
> > >
> > > This is the 'eth' pattern testpmd commands flow create 0 ingress
> > > group 0 pattern end actions jump group 1 / end flow create 0 ingress
> > > group 1 pattern eth / end actions count / rss queues 6 7 8 9 end /
> > > end
> > >
> > > The testpmd commands for 'eth/ipv4'
> > > flow create 0 ingress group 0 pattern end actions jump group 1 / end
> > > flow create 0 ingress group 1 pattern eth / ipv4 / end actions count
> > > / rss queues 6 7
> > > 8 9 end / end
> > >
> > >
> > > This is the testpmd command line...
> > > dpdk-testpmd -l8-14 -a81:00.0,dv_flow_en=1 -- -i --nb-cores 6 --rxq
> > > 6 --txq 6 --port-topology loop --forward-mode=rxonly --hairpinq 4
> > > --hairpin-mode
> > > 0x10
> > >
> > > Versions
> > > mlnx-ofa_kernel-24.04-OFED.24.04.0.6.6.1.rhel9u4.x86_64
> > > kmod-mlnx-ofa_kernel-24.04-OFED.24.04.0.6.6.1.rhel9u4.x86_64
> > > mlnx-ofa_kernel-devel-24.04-OFED.24.04.0.6.6.1.rhel9u4.x86_64
> > > ofed-scripts-24.04-OFED.24.04.0.6.6.x86_64
> > >
> > > DPDK: v24.03
>
>
>
> --
> tony


--
tony