Re: [net/mlx5] Performance drop with HWS compared to SWS

DPDK usage discussions
 help / color / mirror / Atom feed

From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>
Cc: "users@dpdk.org" <users@dpdk.org>
Subject: Re: [net/mlx5] Performance drop with HWS compared to SWS
Date: Thu, 20 Jun 2024 16:05:34 +0300	[thread overview]
Message-ID: <20240620160534.218f544c@sovereign> (raw)
In-Reply-To: <PH0PR12MB880055029212CFDAEB0728C2A4CF2@PH0PR12MB8800.namprd12.prod.outlook.com>

2024-06-19 19:15 (UTC+0000), Dariusz Sosnowski:
[snip]
> > Case "jump_rss", SWS
> > ====================
> > Jump to group 1, then RSS.
> > Result: 37 Mpps (?!)
> > This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed
> > to overcome.
> > Is MPRQ limited only to default RSS in SWS mode?
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> > 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13
> > 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end # start
> > 
> > mlnx_perf -i enp33s0f0np0 -t 1:
> > 
> >       rx_vport_unicast_packets: 38,155,359
> >         rx_vport_unicast_bytes: 2,441,942,976 Bps    = 19,535.54 Mbps
> >                 tx_packets_phy: 7,586
> >                 rx_packets_phy: 151,531,694
> >                   tx_bytes_phy: 485,568 Bps          = 3.88 Mbps
> >                   rx_bytes_phy: 9,698,029,248 Bps    = 77,584.23 Mbps
> >             tx_mac_control_phy: 7,587
> >              tx_pause_ctrl_phy: 7,587
> >                rx_discards_phy: 113,376,265
> >                rx_64_bytes_phy: 151,531,748 Bps      = 1,212.25 Mbps
> >     rx_buffer_passed_thres_phy: 203
> >                 rx_prio0_bytes: 9,698,066,560 Bps    = 77,584.53 Mbps
> >               rx_prio0_packets: 38,155,328
> >              rx_prio0_discards: 113,376,963
> >                tx_global_pause: 7,587
> >       tx_global_pause_duration: 1,018,266
> > 
> > Attached: "neohost-cx6dx-jump_rss-sws.txt".  
> 
> How are you generating the traffic? Are both IP addresses and TCP ports changing?
> 
> "jump_rss" case degradation seems to be caused by RSS configuration.
> It appears that packets are not distributed across all queues.
> With these flow commands in SWS all packets should go to queue 0 only.
> Could you please check if that's the case on your side?
> 
> It can be alleviated this by specifying RSS hash types on RSS action:
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end
> flow create 0 ingress group 1 pattern end actions rss queues <queues> end types ip tcp end / end
> 
> Could you please try that on your side?
> 
> With HWS flow engine, if RSS action does not have hash types specified,
> implementation defaults to hashing on IP addresses.
> If IP addresses are variable in your test traffic, that would explain the difference.

You are right, SWS performance is 148 Mpps with "types ipv4-tcp end"!
I was missing the difference between default RSS "types" for SWS and HWS.

For the reference, generated traffic fields:
* Ethernet: fixed unicast MACs
* VLAN: fixed VLAN IDs, zero TCI
* Source IPv4: random from 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
* Destination IPv4: random from a /16 subnet
* Source TCP port: random
* Destination TCP port: random
* TCP flags: SYN

[snip]
> > Case "jump_miss", SWS
> > =====================
> > Result: 148 Mpps
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end start
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 151,526,489
> >         rx_vport_unicast_bytes: 9,697,695,296 Bps    = 77,581.56 Mbps
> >                 rx_packets_phy: 151,526,193
> >                   rx_bytes_phy: 9,697,676,672 Bps    = 77,581.41 Mbps
> >                rx_64_bytes_phy: 151,525,423 Bps      = 1,212.20 Mbps
> >                 rx_prio0_bytes: 9,697,488,256 Bps    = 77,579.90 Mbps
> >               rx_prio0_packets: 151,523,240
> > 
> > Attached: "neohost-cx6dx-jump_miss-sws.txt".
> > 
> > 
> > Case "jump_miss", HWS
> > =====================
> > Result: 107 Mpps
> > Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > port stop 0
> > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> > end flow actions_template 0 create ingress actions_template_id 1 template jump
> > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >        rx_steer_missed_packets: 109,463,466
> >       rx_vport_unicast_packets: 109,463,450
> >         rx_vport_unicast_bytes: 7,005,660,800 Bps    = 56,045.28 Mbps
> >                 rx_packets_phy: 151,518,062
> >                   rx_bytes_phy: 9,697,155,840 Bps    = 77,577.24 Mbps
> >                rx_64_bytes_phy: 151,516,201 Bps      = 1,212.12 Mbps
> >                 rx_prio0_bytes: 9,697,137,280 Bps    = 77,577.9 Mbps
> >               rx_prio0_packets: 151,517,782
> >           rx_prio0_buf_discard: 42,055,156
> > 
> > Attached: "neohost-cx6dx-jump_miss-hws.txt".  
> 
> As you can see HWS provides "rx_steer_missed_packets" counter, which is not available with SWS.
> It counts the number of packets which did not hit any rule and in the end, they had to be dropped.
> To enable that, additional HW flows are required which would handle packets which did not hit any rule.
> It has a side effect - these HW flows cause enough backpressure
> that on very high packet rate, it causes Rx buffer overflow on CX6 Dx.
> 
> After some internal discussions, I learned that it is kind of expected,
> because this high number of missed packets is already an indication of the problem -
> - NIC resources are wasted on packets for which there is no specified destination.

Fair enough. In production, there will be no misses and thus no issue.
This case is mostly to exhaust the search space.

> 
> > Case "jump_drop", SWS
> > =====================
> > Result: 148 Mpps
> > Match all in group 0, jump to group 1; match all in group 1, drop.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> > 0 ingress group 1 pattern end actions drop / end
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 151,705,269
> >         rx_vport_unicast_bytes: 9,709,137,216 Bps    = 77,673.9 Mbps
> >                 rx_packets_phy: 151,701,498
> >                   rx_bytes_phy: 9,708,896,128 Bps    = 77,671.16 Mbps
> >                rx_64_bytes_phy: 151,693,532 Bps      = 1,213.54 Mbps
> >                 rx_prio0_bytes: 9,707,005,888 Bps    = 77,656.4 Mbps
> >               rx_prio0_packets: 151,671,959
> > 
> > Attached: "neohost-cx6dx-jump_drop-sws.txt".
> > 
> > 
> > Case "jump_drop", HWS
> > =====================
> > Result: 107 Mpps
> > Match all in group 0, jump to group 1; match all in group 1, drop.
> > I've also run this test with a counter attached to the dropping table, and it
> > showed that indeed only 107 Mpps hit the rule.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > port stop 0
> > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> > end flow actions_template 0 create ingress actions_template_id 1 template jump
> > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow
> > actions_template 0 create ingress actions_template_id 2 template drop / end
> > mask drop / end flow template_table 0 create ingress group 1 table_id 2
> > pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0
> > template_table 2 pattern_template 0 actions_template 0 postpone false pattern
> > end actions drop / end flow pull 0 queue 0
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 109,500,637
> >         rx_vport_unicast_bytes: 7,008,040,768 Bps    = 56,064.32 Mbps
> >                 rx_packets_phy: 151,568,915
> >                   rx_bytes_phy: 9,700,410,560 Bps    = 77,603.28 Mbps
> >                rx_64_bytes_phy: 151,569,146 Bps      = 1,212.55 Mbps
> >                 rx_prio0_bytes: 9,699,889,216 Bps    = 77,599.11 Mbps
> >               rx_prio0_packets: 151,560,756
> >           rx_prio0_buf_discard: 42,065,705
> > 
> > Attached: "neohost-cx6dx-jump_drop-hws.txt".  
> 
> We're still looking into "jump_drop" case.

Looking forward for your verdict.
Thank you for investigating and explaining, understanding helps a lot.

> By the way - May I ask what is your target use case with HWS?

The subject area is DDoS attack protection, this is why I'm interested in 64B
packets which are not so frequent in normal traffic, at wire speed.
Drop action performance is crucial, because it is to be used
for blocking malicious traffic that comes at the highest packet rates.
IPv4 is the primary target, IPv6 is nice to have.

Hopefully HWS relaxed matching will allow matching by 5-tuple at max pps.
SWS proved to be incapable of that in groups other than 0
because of extra hops, and group 0 has other limitations.
Typical flow structures expected:

- match by 5-tuple, drop on match, RSS to SW otherwise;
- match by 5-tuple, mark and RSS to SW on match, just RSS to SW otherwise;
- match by 5-tuple, modify MAC and/or VLAN, RSS to 1-port hairpin queues;
- match by 5-tuple, meter (?), RSS to SW or 2-port hairpin queues.

These 5-tuples come in bulk from packet inspection on data path,
so high insertion speed of HWS is very desirable.

Worst case scale: millions of flow rules, ~100K insertions/second,
but HWS may prove itself useful to us even at much smaller scale.
I've already noticed that the cost of table miss with HWS
depends on the table size even if the table is empty
(going from 1024 to 2048 adds 0.5 hops),
so maybe very large tables are not realistic at max pps.

I'm also exploring other features and their performance in pps.

     prev parent reply	other threads:[~2024-06-20 13:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  9:01 Dmitry Kozlyuk
2024-06-13 15:06 ` Dariusz Sosnowski
2024-06-13 20:14   ` Dmitry Kozlyuk
2024-06-19 19:15     ` Dariusz Sosnowski
2024-06-20 13:05       ` Dmitry Kozlyuk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240620160534.218f544c@sovereign \
    --to=dmitry.kozliuk@gmail.com \
    --cc=dsosnowski@nvidia.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).