DPDK usage discussions
 help / color / mirror / Atom feed
* [net/mlx5] Performance drop with HWS compared to SWS
@ 2024-06-13  9:01 Dmitry Kozlyuk
  2024-06-13 15:06 ` Dariusz Sosnowski
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Kozlyuk @ 2024-06-13  9:01 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 4882 bytes --]

Hello,

We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets
apparently caused by any rule that jumps out of ingress group 0
when using HWS (async API) instead of SWS (sync API).
Is it some known issue or temporary limitation?

NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
FW: 22.40.1000
OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
DPDK: v24.03-23-g76cef1af8b
TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.

Examples below do only the jump and miss all packets in group 1,
but the same is observed when dropping all the packets in group 1.

Software steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1

flow create 0 ingress group 0 pattern end actions jump group 1 / end

Neohost (from OFED 5.7):

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||

(Full Neohost output is attached.)

Hardware steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

Neohost:

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||

AFAIU, performance is not constrained by the complexity of the rules.

mlnx_perf -i enp33s0f0np0 -t 1:

       rx_steer_missed_packets: 108,743,272
      rx_vport_unicast_packets: 108,743,424
        rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps      
                tx_packets_phy: 7,537
                rx_packets_phy: 150,538,251
                  tx_bytes_phy: 482,368 Bps          = 3.85 Mbps           
                  rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps      
            tx_mac_control_phy: 7,536
             tx_pause_ctrl_phy: 7,536
               rx_discards_phy: 41,794,740
               rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps       
    rx_buffer_passed_thres_phy: 202
                rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps      
              rx_prio0_packets: 108,744,322
             rx_prio0_discards: 41,795,050
               tx_global_pause: 7,537
      tx_global_pause_duration: 1,011,592

"rx_discards_phy" is described as follows [1]:

    The number of received packets dropped due to lack of buffers on a
    physical port. If this counter is increasing, it implies that the adapter
    is congested and cannot absorb the traffic coming from the network.

However, the adapter certainly *is* able to process 148 Mpps,
since it does so with SWS and it can deliver this much to SW (with MPRQ).

[1]: https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst

[-- Attachment #2: neohost-cx6dx-jump-hws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 55.039        [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3851      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3851      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9925        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 523           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 548           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 107,498,116     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 253,168,409     ||
|| RX_PSA0 Steering Pipe 1                                   || 235,977,321     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 224,400,319     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 208,687,547     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 224,400,319     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 208,687,547     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 253,168,409     ||
|| RX_PSA1 Steering Pipe 1                                   || 235,977,321     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 224,400,319     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 208,687,547     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 224,400,319     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 208,687,547     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #3: neohost-cx6dx-jump-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1926       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.385       [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.385       [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9919        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 522           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 541           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,813,592     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 243,977,877     ||
|| RX_PSA0 Steering Pipe 1                                   || 213,617,683     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 203,526,803     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 177,919,444     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 202,742,093     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 177,158,314     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 161,513         ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 158,843         ||
|| RX_PSA1 Steering Pipe 0                                   || 243,977,877     ||
|| RX_PSA1 Steering Pipe 1                                   || 213,617,683     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 203,526,803     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 177,919,444     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 202,742,093     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 177,158,314     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 161,513         ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-20 13:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-13  9:01 [net/mlx5] Performance drop with HWS compared to SWS Dmitry Kozlyuk
2024-06-13 15:06 ` Dariusz Sosnowski
2024-06-13 20:14   ` Dmitry Kozlyuk
2024-06-19 19:15     ` Dariusz Sosnowski
2024-06-20 13:05       ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).