Hi Dariusz, Thank you for looking into the issue, please find full details below. Summary: Case SWS (Mpps) HWS (Mpps) -------- ---------- ---------- baseline 148 - jump_rss 37 148 jump_miss 148 107 jump_drop 148 107 From "baseline" vs "jump_rss", the problem is not in jump. From "jump_miss" vs "jump_drop", the problem is not only in miss. This is a lab so I can try anything else you need for diagnostic. Disabling flow control only fixes the number of packets received by PHY, but not the number of packets processed by steering. > - Could you share mlnx_perf stats for SWS case as well? rx_vport_unicast_packets: 151,716,299 rx_vport_unicast_bytes: 9,709,843,136 Bps = 77,678.74 Mbps rx_packets_phy: 151,716,517 rx_bytes_phy: 9,709,856,896 Bps = 77,678.85 Mbps rx_64_bytes_phy: 151,716,867 Bps = 1,213.73 Mbps rx_prio0_bytes: 9,710,051,648 Bps = 77,680.41 Mbps rx_prio0_packets: 151,719,564 > - If group 1 had a flow rule with empty match and RSS action, is the performance difference the same? > (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1). Case "baseline" =============== No flow rules, just to make sure the host can poll the NIC fast enough. Result: 148 Mpps /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 mlnx_perf -i enp33s0f0np0 -t 1 rx_vport_unicast_packets: 151,622,123 rx_vport_unicast_bytes: 9,703,815,872 Bps = 77,630.52 Mbps rx_packets_phy: 151,621,983 rx_bytes_phy: 9,703,807,872 Bps = 77,630.46 Mbps rx_64_bytes_phy: 151,621,026 Bps = 1,212.96 Mbps rx_prio0_bytes: 9,703,716,480 Bps = 77,629.73 Mbps rx_prio0_packets: 151,620,576 Attached: "neohost-cx6dx-baseline-sws.txt". Case "jump_rss", SWS ==================== Jump to group 1, then RSS. Result: 37 Mpps (?!) This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed to overcome. Is MPRQ limited only to default RSS in SWS mode? /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end # start mlnx_perf -i enp33s0f0np0 -t 1: rx_vport_unicast_packets: 38,155,359 rx_vport_unicast_bytes: 2,441,942,976 Bps = 19,535.54 Mbps tx_packets_phy: 7,586 rx_packets_phy: 151,531,694 tx_bytes_phy: 485,568 Bps = 3.88 Mbps rx_bytes_phy: 9,698,029,248 Bps = 77,584.23 Mbps tx_mac_control_phy: 7,587 tx_pause_ctrl_phy: 7,587 rx_discards_phy: 113,376,265 rx_64_bytes_phy: 151,531,748 Bps = 1,212.25 Mbps rx_buffer_passed_thres_phy: 203 rx_prio0_bytes: 9,698,066,560 Bps = 77,584.53 Mbps rx_prio0_packets: 38,155,328 rx_prio0_discards: 113,376,963 tx_global_pause: 7,587 tx_global_pause_duration: 1,018,266 Attached: "neohost-cx6dx-jump_rss-sws.txt". Case "jump_rss", HWS ==================== Result: 148 Mpps /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 port stop 0 flow configure 0 queues_number 1 queues_size 128 counters_number 16 port start 0 # flow pattern_template 0 create pattern_template_id 1 ingress template end flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow actions_template 0 create ingress actions_template_id 2 template rss / end mask rss / end flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end flow pull 0 queue 0 # start mlnx_perf -i enp33s0f0np0 -t 1: rx_vport_unicast_packets: 151,514,131 rx_vport_unicast_bytes: 9,696,904,384 Bps = 77,575.23 Mbps rx_packets_phy: 151,514,275 rx_bytes_phy: 9,696,913,600 Bps = 77,575.30 Mbps rx_64_bytes_phy: 151,514,122 Bps = 1,212.11 Mbps rx_prio0_bytes: 9,696,814,528 Bps = 77,574.51 Mbps rx_prio0_packets: 151,512,717 Attached: "neohost-cx6dx-jump_rss-hws.txt". > - Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled? $ ethtool -A enp33s0f0np0 rx off tx off $ ethtool -a enp33s0f0np0 Pause parameters for enp33s0f0np0: Autonegotiate: off RX: off TX: off testpmd> show port 0 flow_ctrl ********************* Flow control infos for port 0 ********************* FC mode: Rx pause: off Tx pause: off Autoneg: off Pause time: 0x0 High waterline: 0x0 Low waterline: 0x0 Send XON: off Forward MAC control frames: off Case "jump_miss", SWS ===================== Result: 148 Mpps /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 flow create 0 ingress group 0 pattern end actions jump group 1 / end start mlnx_perf -i enp33s0f0np0 rx_vport_unicast_packets: 151,526,489 rx_vport_unicast_bytes: 9,697,695,296 Bps = 77,581.56 Mbps rx_packets_phy: 151,526,193 rx_bytes_phy: 9,697,676,672 Bps = 77,581.41 Mbps rx_64_bytes_phy: 151,525,423 Bps = 1,212.20 Mbps rx_prio0_bytes: 9,697,488,256 Bps = 77,579.90 Mbps rx_prio0_packets: 151,523,240 Attached: "neohost-cx6dx-jump_miss-sws.txt". Case "jump_miss", HWS ===================== Result: 107 Mpps Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps. /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 port stop 0 flow configure 0 queues_number 1 queues_size 128 counters_number 16 port start 0 flow pattern_template 0 create pattern_template_id 1 ingress template end flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 mlnx_perf -i enp33s0f0np0 rx_steer_missed_packets: 109,463,466 rx_vport_unicast_packets: 109,463,450 rx_vport_unicast_bytes: 7,005,660,800 Bps = 56,045.28 Mbps rx_packets_phy: 151,518,062 rx_bytes_phy: 9,697,155,840 Bps = 77,577.24 Mbps rx_64_bytes_phy: 151,516,201 Bps = 1,212.12 Mbps rx_prio0_bytes: 9,697,137,280 Bps = 77,577.9 Mbps rx_prio0_packets: 151,517,782 rx_prio0_buf_discard: 42,055,156 Attached: "neohost-cx6dx-jump_miss-hws.txt". Case "jump_drop", SWS ===================== Result: 148 Mpps Match all in group 0, jump to group 1; match all in group 1, drop. /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create 0 ingress group 1 pattern end actions drop / end mlnx_perf -i enp33s0f0np0 rx_vport_unicast_packets: 151,705,269 rx_vport_unicast_bytes: 9,709,137,216 Bps = 77,673.9 Mbps rx_packets_phy: 151,701,498 rx_bytes_phy: 9,708,896,128 Bps = 77,671.16 Mbps rx_64_bytes_phy: 151,693,532 Bps = 1,213.54 Mbps rx_prio0_bytes: 9,707,005,888 Bps = 77,656.4 Mbps rx_prio0_packets: 151,671,959 Attached: "neohost-cx6dx-jump_drop-sws.txt". Case "jump_drop", HWS ===================== Result: 107 Mpps Match all in group 0, jump to group 1; match all in group 1, drop. I've also run this test with a counter attached to the dropping table, and it showed that indeed only 107 Mpps hit the rule. /root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \ -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 port stop 0 flow configure 0 queues_number 1 queues_size 128 counters_number 16 port start 0 flow pattern_template 0 create pattern_template_id 1 ingress template end flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow actions_template 0 create ingress actions_template_id 2 template drop / end mask drop / end flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions drop / end flow pull 0 queue 0 mlnx_perf -i enp33s0f0np0 rx_vport_unicast_packets: 109,500,637 rx_vport_unicast_bytes: 7,008,040,768 Bps = 56,064.32 Mbps rx_packets_phy: 151,568,915 rx_bytes_phy: 9,700,410,560 Bps = 77,603.28 Mbps rx_64_bytes_phy: 151,569,146 Bps = 1,212.55 Mbps rx_prio0_bytes: 9,699,889,216 Bps = 77,599.11 Mbps rx_prio0_packets: 151,560,756 rx_prio0_buf_discard: 42,065,705 Attached: "neohost-cx6dx-jump_drop-hws.txt".