DPDK usage discussions
 help / color / mirror / Atom feed
* [net/mlx5] Performance drop with HWS compared to SWS
@ 2024-06-13  9:01 Dmitry Kozlyuk
  2024-06-13 15:06 ` Dariusz Sosnowski
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Kozlyuk @ 2024-06-13  9:01 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 4882 bytes --]

Hello,

We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets
apparently caused by any rule that jumps out of ingress group 0
when using HWS (async API) instead of SWS (sync API).
Is it some known issue or temporary limitation?

NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
FW: 22.40.1000
OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
DPDK: v24.03-23-g76cef1af8b
TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.

Examples below do only the jump and miss all packets in group 1,
but the same is observed when dropping all the packets in group 1.

Software steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1

flow create 0 ingress group 0 pattern end actions jump group 1 / end

Neohost (from OFED 5.7):

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||

(Full Neohost output is attached.)

Hardware steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

Neohost:

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||

AFAIU, performance is not constrained by the complexity of the rules.

mlnx_perf -i enp33s0f0np0 -t 1:

       rx_steer_missed_packets: 108,743,272
      rx_vport_unicast_packets: 108,743,424
        rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps      
                tx_packets_phy: 7,537
                rx_packets_phy: 150,538,251
                  tx_bytes_phy: 482,368 Bps          = 3.85 Mbps           
                  rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps      
            tx_mac_control_phy: 7,536
             tx_pause_ctrl_phy: 7,536
               rx_discards_phy: 41,794,740
               rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps       
    rx_buffer_passed_thres_phy: 202
                rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps      
              rx_prio0_packets: 108,744,322
             rx_prio0_discards: 41,795,050
               tx_global_pause: 7,537
      tx_global_pause_duration: 1,011,592

"rx_discards_phy" is described as follows [1]:

    The number of received packets dropped due to lack of buffers on a
    physical port. If this counter is increasing, it implies that the adapter
    is congested and cannot absorb the traffic coming from the network.

However, the adapter certainly *is* able to process 148 Mpps,
since it does so with SWS and it can deliver this much to SW (with MPRQ).

[1]: https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst

[-- Attachment #2: neohost-cx6dx-jump-hws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 55.039        [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3851      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3851      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9925        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 523           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 548           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 107,498,116     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 253,168,409     ||
|| RX_PSA0 Steering Pipe 1                                   || 235,977,321     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 224,400,319     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 208,687,547     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 224,400,319     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 208,687,547     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 253,168,409     ||
|| RX_PSA1 Steering Pipe 1                                   || 235,977,321     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 224,400,319     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 208,687,547     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 224,400,319     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 208,687,547     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #3: neohost-cx6dx-jump-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1926       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.385       [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.385       [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9919        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 522           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 541           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,813,592     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 243,977,877     ||
|| RX_PSA0 Steering Pipe 1                                   || 213,617,683     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 203,526,803     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 177,919,444     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 202,742,093     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 177,158,314     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 161,513         ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 158,843         ||
|| RX_PSA1 Steering Pipe 0                                   || 243,977,877     ||
|| RX_PSA1 Steering Pipe 1                                   || 213,617,683     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 203,526,803     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 177,919,444     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 202,742,093     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 177,158,314     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 161,513         ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [net/mlx5] Performance drop with HWS compared to SWS
  2024-06-13  9:01 [net/mlx5] Performance drop with HWS compared to SWS Dmitry Kozlyuk
@ 2024-06-13 15:06 ` Dariusz Sosnowski
  2024-06-13 20:14   ` Dmitry Kozlyuk
  0 siblings, 1 reply; 6+ messages in thread
From: Dariusz Sosnowski @ 2024-06-13 15:06 UTC (permalink / raw)
  To: Dmitry Kozlyuk, users

Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Sent: Thursday, June 13, 2024 11:02
> To: users@dpdk.org
> Subject: [net/mlx5] Performance drop with HWS compared to SWS
> 
> Hello,
> 
> We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B
> packets apparently caused by any rule that jumps out of ingress group 0 when
> using HWS (async API) instead of SWS (sync API).
> Is it some known issue or temporary limitation?

This is not an expected behavior. It's expected that performance will be the same. 
Thank you for reporting that and for neohost dumps.

I have a few questions:

- Could you share mlnx_perf stats for SWS case as well?
- If group 1 had a flow rule with empty match and RSS action, is the performance difference the same?
  (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1).
- Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled?

> NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0
> x16;
> FW: 22.40.1000
> OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
> DPDK: v24.03-23-g76cef1af8b
> TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.
> 
> Examples below do only the jump and miss all packets in group 1, but the same is
> observed when dropping all the packets in group 1.
> 
> Software steering:
> 
> /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end
> 
> Neohost (from OFED 5.7):
> 
> ||=====================================================================
> =
> ||=====
> |||                               Packet Rate                               ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
> ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
> ||=====================================================================
> =
> ||=====
> |||                                 eSwitch                                 ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
> ||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
> ||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
> ||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||
> 
> (Full Neohost output is attached.)
> 
> Hardware steering:
> 
> /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1
> 
> port stop 0
> flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> end flow actions_template 0 create ingress actions_template_id 1 template jump
> group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> postpone false pattern end actions jump group 1 / end flow pull 0 queue 0
> 
> Neohost:
> 
> ||=====================================================================
> =
> ||=====
> |||                               Packet Rate                               ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
> ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
> ||=====================================================================
> =
> ||=====
> |||                                 eSwitch                                 ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
> ||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
> ||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
> ||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||
> 
> AFAIU, performance is not constrained by the complexity of the rules.
> 
> mlnx_perf -i enp33s0f0np0 -t 1:
> 
>        rx_steer_missed_packets: 108,743,272
>       rx_vport_unicast_packets: 108,743,424
>         rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps
>                 tx_packets_phy: 7,537
>                 rx_packets_phy: 150,538,251
>                   tx_bytes_phy: 482,368 Bps          = 3.85 Mbps
>                   rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps
>             tx_mac_control_phy: 7,536
>              tx_pause_ctrl_phy: 7,536
>                rx_discards_phy: 41,794,740
>                rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps
>     rx_buffer_passed_thres_phy: 202
>                 rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps
>               rx_prio0_packets: 108,744,322
>              rx_prio0_discards: 41,795,050
>                tx_global_pause: 7,537
>       tx_global_pause_duration: 1,011,592
> 
> "rx_discards_phy" is described as follows [1]:
> 
>     The number of received packets dropped due to lack of buffers on a
>     physical port. If this counter is increasing, it implies that the adapter
>     is congested and cannot absorb the traffic coming from the network.
> 
> However, the adapter certainly *is* able to process 148 Mpps, since it does so
> with SWS and it can deliver this much to SW (with MPRQ).
> 
> [1]:
> https://www.kernel.org/doc/Documentation/networking/device_drivers/etherne
> t/mellanox/mlx5/counters.rst

Best regards,
Dariusz Sosnowski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net/mlx5] Performance drop with HWS compared to SWS
  2024-06-13 15:06 ` Dariusz Sosnowski
@ 2024-06-13 20:14   ` Dmitry Kozlyuk
  2024-06-19 19:15     ` Dariusz Sosnowski
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Kozlyuk @ 2024-06-13 20:14 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 11247 bytes --]

Hi Dariusz,

Thank you for looking into the issue, please find full details below.

Summary:

Case       SWS (Mpps)  HWS (Mpps)
--------   ----------  ----------
baseline       148          -
jump_rss        37        148
jump_miss      148        107
jump_drop      148        107

From "baseline" vs "jump_rss", the problem is not in jump.
From "jump_miss" vs "jump_drop", the problem is not only in miss.
This is a lab so I can try anything else you need for diagnostic.

Disabling flow control only fixes the number of packets received by PHY,
but not the number of packets processed by steering.

> - Could you share mlnx_perf stats for SWS case as well?

      rx_vport_unicast_packets: 151,716,299
        rx_vport_unicast_bytes: 9,709,843,136 Bps    = 77,678.74 Mbps      
                rx_packets_phy: 151,716,517
                  rx_bytes_phy: 9,709,856,896 Bps    = 77,678.85 Mbps      
               rx_64_bytes_phy: 151,716,867 Bps      = 1,213.73 Mbps       
                rx_prio0_bytes: 9,710,051,648 Bps    = 77,680.41 Mbps      
              rx_prio0_packets: 151,719,564

> - If group 1 had a flow rule with empty match and RSS action, is the performance difference the same?
>   (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1).

Case "baseline"
===============
No flow rules, just to make sure the host can poll the NIC fast enough.
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

mlnx_perf -i enp33s0f0np0 -t 1

      rx_vport_unicast_packets: 151,622,123
        rx_vport_unicast_bytes: 9,703,815,872 Bps    = 77,630.52 Mbps      
                rx_packets_phy: 151,621,983
                  rx_bytes_phy: 9,703,807,872 Bps    = 77,630.46 Mbps      
               rx_64_bytes_phy: 151,621,026 Bps      = 1,212.96 Mbps       
                rx_prio0_bytes: 9,703,716,480 Bps    = 77,629.73 Mbps      
              rx_prio0_packets: 151,620,576

Attached: "neohost-cx6dx-baseline-sws.txt".

Case "jump_rss", SWS
====================
Jump to group 1, then RSS.
Result: 37 Mpps (?!)
This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed to overcome.
Is MPRQ limited only to default RSS in SWS mode?

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
#
start

mlnx_perf -i enp33s0f0np0 -t 1:

      rx_vport_unicast_packets: 38,155,359
        rx_vport_unicast_bytes: 2,441,942,976 Bps    = 19,535.54 Mbps      
                tx_packets_phy: 7,586
                rx_packets_phy: 151,531,694
                  tx_bytes_phy: 485,568 Bps          = 3.88 Mbps           
                  rx_bytes_phy: 9,698,029,248 Bps    = 77,584.23 Mbps      
            tx_mac_control_phy: 7,587
             tx_pause_ctrl_phy: 7,587
               rx_discards_phy: 113,376,265
               rx_64_bytes_phy: 151,531,748 Bps      = 1,212.25 Mbps       
    rx_buffer_passed_thres_phy: 203
                rx_prio0_bytes: 9,698,066,560 Bps    = 77,584.53 Mbps      
              rx_prio0_packets: 38,155,328
             rx_prio0_discards: 113,376,963
               tx_global_pause: 7,587
      tx_global_pause_duration: 1,018,266

Attached: "neohost-cx6dx-jump_rss-sws.txt".

Case "jump_rss", HWS
====================
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
#
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template rss / end mask rss / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
flow pull 0 queue 0
#
start

mlnx_perf -i enp33s0f0np0 -t 1:

      rx_vport_unicast_packets: 151,514,131
        rx_vport_unicast_bytes: 9,696,904,384 Bps    = 77,575.23 Mbps      
                rx_packets_phy: 151,514,275
                  rx_bytes_phy: 9,696,913,600 Bps    = 77,575.30 Mbps      
               rx_64_bytes_phy: 151,514,122 Bps      = 1,212.11 Mbps       
                rx_prio0_bytes: 9,696,814,528 Bps    = 77,574.51 Mbps      
              rx_prio0_packets: 151,512,717

Attached: "neohost-cx6dx-jump_rss-hws.txt".

> - Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled?

$ ethtool -A enp33s0f0np0 rx off tx off

$ ethtool -a enp33s0f0np0
Pause parameters for enp33s0f0np0:
Autonegotiate:	off
RX:		off
TX:		off

testpmd> show port 0 flow_ctrl 

********************* Flow control infos for port 0  *********************
FC mode:
   Rx pause: off
   Tx pause: off
Autoneg: off
Pause time: 0x0
High waterline: 0x0
Low waterline: 0x0
Send XON: off
Forward MAC control frames: off


Case "jump_miss", SWS
=====================
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
start

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 151,526,489
        rx_vport_unicast_bytes: 9,697,695,296 Bps    = 77,581.56 Mbps      
                rx_packets_phy: 151,526,193
                  rx_bytes_phy: 9,697,676,672 Bps    = 77,581.41 Mbps      
               rx_64_bytes_phy: 151,525,423 Bps      = 1,212.20 Mbps       
                rx_prio0_bytes: 9,697,488,256 Bps    = 77,579.90 Mbps      
              rx_prio0_packets: 151,523,240

Attached: "neohost-cx6dx-jump_miss-sws.txt".


Case "jump_miss", HWS
=====================
Result: 107 Mpps
Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

mlnx_perf -i enp33s0f0np0

       rx_steer_missed_packets: 109,463,466
      rx_vport_unicast_packets: 109,463,450
        rx_vport_unicast_bytes: 7,005,660,800 Bps    = 56,045.28 Mbps      
                rx_packets_phy: 151,518,062
                  rx_bytes_phy: 9,697,155,840 Bps    = 77,577.24 Mbps      
               rx_64_bytes_phy: 151,516,201 Bps      = 1,212.12 Mbps       
                rx_prio0_bytes: 9,697,137,280 Bps    = 77,577.9 Mbps       
              rx_prio0_packets: 151,517,782
          rx_prio0_buf_discard: 42,055,156

Attached: "neohost-cx6dx-jump_miss-hws.txt".

Case "jump_drop", SWS
=====================
Result: 148 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions drop / end

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 151,705,269
        rx_vport_unicast_bytes: 9,709,137,216 Bps    = 77,673.9 Mbps       
                rx_packets_phy: 151,701,498
                  rx_bytes_phy: 9,708,896,128 Bps    = 77,671.16 Mbps      
               rx_64_bytes_phy: 151,693,532 Bps      = 1,213.54 Mbps       
                rx_prio0_bytes: 9,707,005,888 Bps    = 77,656.4 Mbps       
              rx_prio0_packets: 151,671,959

Attached: "neohost-cx6dx-jump_drop-sws.txt".


Case "jump_drop", HWS
=====================
Result: 107 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.
I've also run this test with a counter attached to the dropping table,
and it showed that indeed only 107 Mpps hit the rule.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template drop / end mask drop / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions drop / end
flow pull 0 queue 0

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 109,500,637
        rx_vport_unicast_bytes: 7,008,040,768 Bps    = 56,064.32 Mbps      
                rx_packets_phy: 151,568,915
                  rx_bytes_phy: 9,700,410,560 Bps    = 77,603.28 Mbps      
               rx_64_bytes_phy: 151,569,146 Bps      = 1,212.55 Mbps       
                rx_prio0_bytes: 9,699,889,216 Bps    = 77,599.11 Mbps      
              rx_prio0_packets: 151,560,756
          rx_prio0_buf_discard: 42,065,705

Attached: "neohost-cx6dx-jump_drop-hws.txt".

[-- Attachment #2: neohost-cx6dx-baseline-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 150,044,655     |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 5,878,330       ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1929       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 157,018,528     ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 151,533         |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 49              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 1               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3869      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 13.15         [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 33.0575       [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3869      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 46.5101       [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 116.9203      [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.994         |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 512           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 1,127         [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 409           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 150,595,604     ||===========================================================================
|| Receive WQE Cache Hit                                     || 147,113,254     |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 1,701,085       ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 1,796,240       ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,814,339     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 127,741,993     ||| RX Packet Rate                      || 148,814,339   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 18,529,127      ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.5787        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.7894        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 240.3007      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 235.3318      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 271,910,888     ||
|| RX_PSA0 Steering Pipe 1                                   || 260,654,182     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 233,741,717     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 224,534,200     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 233,741,717     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 224,534,200     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 271,910,888     ||
|| RX_PSA1 Steering Pipe 1                                   || 260,654,182     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 233,741,717     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 224,534,200     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 233,741,717     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 224,534,200     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #3: neohost-cx6dx-jump_drop-hws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1902       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3869      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3869      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9939        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 520           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 541           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 513           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,478     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 107,498,480     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,808,991   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 2.9417        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.4709        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 292.3339      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 289.3756      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 221,120,803     ||
|| RX_PSA0 Steering Pipe 1                                   || 216,636,571     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 178,508,554     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 174,743,229     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 178,508,554     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 174,743,229     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 221,120,803     ||
|| RX_PSA1 Steering Pipe 1                                   || 216,636,571     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 178,508,554     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 174,743,229     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 178,508,554     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 174,743,229     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #4: neohost-cx6dx-jump_drop-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1898       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3872      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3872      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9954        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 38,998        [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 154,927       [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,808,157     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,808,153     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,808,157   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.5859        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.7929        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 239.8323      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 235.505       [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 271,700,490     ||
|| RX_PSA0 Steering Pipe 1                                   || 261,909,206     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 233,609,413     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 225,682,781     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 232,689,324     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 224,797,255     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 187,761         ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 186,626         ||
|| RX_PSA1 Steering Pipe 0                                   || 271,700,490     ||
|| RX_PSA1 Steering Pipe 1                                   || 261,909,206     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 233,609,413     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 225,682,781     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 232,689,324     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 224,797,255     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 187,761         ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #5: neohost-cx6dx-jump_miss-hws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1916       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3882      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3882      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9966        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 658           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 865           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,499,162     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 107,499,162     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,811,652   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.6303        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.8151        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 236.8997      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 234.6136      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 272,739,987     ||
|| RX_PSA0 Steering Pipe 1                                   || 267,489,148     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 231,381,170     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 226,856,935     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 231,381,170     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 226,856,935     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 272,739,987     ||
|| RX_PSA1 Steering Pipe 1                                   || 267,489,148     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 231,381,170     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 226,856,935     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 231,381,170     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 226,856,935     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #6: neohost-cx6dx-jump_miss-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1932       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 41              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3884      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3884      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9958        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 528           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 562           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,814,890     ||===========================================================================
|| Receive WQE Cache Hit                                     || 0               |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,814,893     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               ||| RX Packet Rate                      || 148,814,893   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 3.0752        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 1.5376        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 279.6539      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 262.2343      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 244,017,616     ||
|| RX_PSA0 Steering Pipe 1                                   || 213,614,158     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 203,543,108     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 177,919,501     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 202,761,457     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 177,158,737     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 161,549         ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 158,487         ||
|| RX_PSA1 Steering Pipe 0                                   || 244,017,616     ||
|| RX_PSA1 Steering Pipe 1                                   || 213,614,158     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 203,543,108     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 177,919,501     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 202,761,457     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 177,158,737     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 161,549         ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #7: neohost-cx6dx-jump_rss-hws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 150,042,800     |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 5,878,322       ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 76.1916       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 157,010,285     ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 155,820         |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3871      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 13.1151       [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 32.9696       [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3871      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 46.509        [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 116.9176      [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.995         |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 512           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 1,046         [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 404           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 150,599,990     ||===========================================================================
|| Receive WQE Cache Hit                                     || 147,103,511     |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 1,708,147       ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 2,065,499       ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,811,662     ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 127,739,686     ||| RX Packet Rate                      || 148,811,663   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 18,528,797      ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 4.0594        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 2.0297        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 211.8515      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 206.0378      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 310,565,746     ||
|| RX_PSA0 Steering Pipe 1                                   || 293,523,665     ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 271,712,400     ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 256,959,447     ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 271,712,400     ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 256,959,447     ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| RX_PSA1 Steering Pipe 0                                   || 310,565,746     ||
|| RX_PSA1 Steering Pipe 1                                   || 293,523,665     ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 271,712,400     ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 256,959,447     ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 271,712,400     ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 256,959,447     ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

[-- Attachment #8: neohost-cx6dx-jump_rss-sws.txt --]
[-- Type: text/plain, Size: 10763 bytes --]

=============================================================================================================================================================
|| Counter Name                                              || Counter Value   ||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 39,520,797      |||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               ||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               ||| RX BandWidth                        || 19.1855       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               ||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 39,580,636      ||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               |||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               ||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               ||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               ||===========================================================================
|| ICM Cache Miss                                            || 38              |||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               ||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               ||| PCIe Inbound Available BW           || 251.3864      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               ||| PCIe Inbound BW Utilization         || 3.5628        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               ||| PCIe Inbound Used BW                || 8.9564        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               ||| PCIe Outbound Available BW          || 251.3864      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               ||| PCIe Outbound BW Utilization        || 11.7151       [%]                ||
|| RX IPsec Packets                                          || 0               ||| PCIe Outbound Used BW               || 29.4501       [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               ||===========================================================================
|| Chip Frequency                                            || 429.9931        |||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               ||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               ||| PCIe Avg Latency                    || 474           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               ||| PCIe Max Latency                    || 800           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               ||| PCIe Min Latency                    || 379           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 38,976,069      ||===========================================================================
|| Receive WQE Cache Hit                                     || 35,987,886      |||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 1,483,680       ||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               ||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 37,471,578      ||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               ||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               ||===========================================================================
|| CQ All State Machines Busy                                || 0               |||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               ||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 32,164,319      ||| RX Packet Rate                      || 37,471,584    [Packets/Seconds]  ||
|| Compressed CQEs                                           || 4,665,303       ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               ||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               |||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 791             ||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               ||| RX Hops Per Packet                  || 4.1094        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               ||| RX Optimal Hops Per Packet Per Pipe || 2.0547        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               ||| RX Optimal Packet Rate Bottleneck   || 209.2729      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               ||| RX Packet Rate Bottleneck           || 202.9695      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               ||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               ||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               ||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               ||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               ||===========================================================================
|| Line Transmitted Port 1                                   || 0               ||
|| Line Transmitted Port 2                                   || 0               ||
|| Line Transmitted Loop Back                                || 0               ||
|| RX_PSA0 Steering Pipe 0                                   || 79,383,952      ||
|| RX_PSA0 Steering Pipe 1                                   || 74,602,673      ||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 68,634,045      ||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 64,387,570      ||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 67,590,325      ||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 63,403,800      ||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 180,882         ||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 179,564         ||
|| RX_PSA1 Steering Pipe 0                                   || 79,383,952      ||
|| RX_PSA1 Steering Pipe 1                                   || 74,602,673      ||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 68,634,045      ||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 64,387,570      ||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 67,590,325      ||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 63,403,800      ||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 180,882         ||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA0 Steering Pipe 0                                   || 0               ||
|| TX_PSA0 Steering Pipe 1                                   || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               ||
|| TX_PSA1 Steering Pipe 0                                   || 0               ||
|| TX_PSA1 Steering Pipe 1                                   || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               ||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               ||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               ||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               ||
==================================================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [net/mlx5] Performance drop with HWS compared to SWS
  2024-06-13 20:14   ` Dmitry Kozlyuk
@ 2024-06-19 19:15     ` Dariusz Sosnowski
  2024-06-20 13:05       ` Dmitry Kozlyuk
  2024-09-27 11:33       ` Dmitry Kozlyuk
  0 siblings, 2 replies; 6+ messages in thread
From: Dariusz Sosnowski @ 2024-06-19 19:15 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

Hi,

Thank you for running all the tests and for all the data. Really appreciated.

> -----Original Message-----
> From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Sent: Thursday, June 13, 2024 22:15
> To: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Cc: users@dpdk.org
> Subject: Re: [net/mlx5] Performance drop with HWS compared to SWS
> 
> Hi Dariusz,
> 
> Thank you for looking into the issue, please find full details below.
> 
> Summary:
> 
> Case       SWS (Mpps)  HWS (Mpps)
> --------   ----------  ----------
> baseline       148          -
> jump_rss        37        148
> jump_miss      148        107
> jump_drop      148        107
> 
> From "baseline" vs "jump_rss", the problem is not in jump.
> From "jump_miss" vs "jump_drop", the problem is not only in miss.
> This is a lab so I can try anything else you need for diagnostic.
> 
> Disabling flow control only fixes the number of packets received by PHY, but not
> the number of packets processed by steering.
> 
> > - Could you share mlnx_perf stats for SWS case as well?
> 
>       rx_vport_unicast_packets: 151,716,299
>         rx_vport_unicast_bytes: 9,709,843,136 Bps    = 77,678.74 Mbps
>                 rx_packets_phy: 151,716,517
>                   rx_bytes_phy: 9,709,856,896 Bps    = 77,678.85 Mbps
>                rx_64_bytes_phy: 151,716,867 Bps      = 1,213.73 Mbps
>                 rx_prio0_bytes: 9,710,051,648 Bps    = 77,680.41 Mbps
>               rx_prio0_packets: 151,719,564
> 
> > - If group 1 had a flow rule with empty match and RSS action, is the
> performance difference the same?
> >   (This would help to understand if the problem is with miss behavior or with
> jump between group 0 and group 1).
> 
> Case "baseline"
> ===============
> No flow rules, just to make sure the host can poll the NIC fast enough.
> Result: 148 Mpps
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> mlnx_perf -i enp33s0f0np0 -t 1
> 
>       rx_vport_unicast_packets: 151,622,123
>         rx_vport_unicast_bytes: 9,703,815,872 Bps    = 77,630.52 Mbps
>                 rx_packets_phy: 151,621,983
>                   rx_bytes_phy: 9,703,807,872 Bps    = 77,630.46 Mbps
>                rx_64_bytes_phy: 151,621,026 Bps      = 1,212.96 Mbps
>                 rx_prio0_bytes: 9,703,716,480 Bps    = 77,629.73 Mbps
>               rx_prio0_packets: 151,620,576
> 
> Attached: "neohost-cx6dx-baseline-sws.txt".
> 
> Case "jump_rss", SWS
> ====================
> Jump to group 1, then RSS.
> Result: 37 Mpps (?!)
> This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed
> to overcome.
> Is MPRQ limited only to default RSS in SWS mode?
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13
> 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end # start
> 
> mlnx_perf -i enp33s0f0np0 -t 1:
> 
>       rx_vport_unicast_packets: 38,155,359
>         rx_vport_unicast_bytes: 2,441,942,976 Bps    = 19,535.54 Mbps
>                 tx_packets_phy: 7,586
>                 rx_packets_phy: 151,531,694
>                   tx_bytes_phy: 485,568 Bps          = 3.88 Mbps
>                   rx_bytes_phy: 9,698,029,248 Bps    = 77,584.23 Mbps
>             tx_mac_control_phy: 7,587
>              tx_pause_ctrl_phy: 7,587
>                rx_discards_phy: 113,376,265
>                rx_64_bytes_phy: 151,531,748 Bps      = 1,212.25 Mbps
>     rx_buffer_passed_thres_phy: 203
>                 rx_prio0_bytes: 9,698,066,560 Bps    = 77,584.53 Mbps
>               rx_prio0_packets: 38,155,328
>              rx_prio0_discards: 113,376,963
>                tx_global_pause: 7,587
>       tx_global_pause_duration: 1,018,266
> 
> Attached: "neohost-cx6dx-jump_rss-sws.txt".

How are you generating the traffic? Are both IP addresses and TCP ports changing?

"jump_rss" case degradation seems to be caused by RSS configuration.
It appears that packets are not distributed across all queues.
With these flow commands in SWS all packets should go to queue 0 only.
Could you please check if that's the case on your side?

It can be alleviated this by specifying RSS hash types on RSS action:

flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions rss queues <queues> end types ip tcp end / end

Could you please try that on your side?

With HWS flow engine, if RSS action does not have hash types specified,
implementation defaults to hashing on IP addresses.
If IP addresses are variable in your test traffic, that would explain the difference.

> Case "jump_rss", HWS
> ====================
> Result: 148 Mpps
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> port stop 0
> flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> start 0 # flow pattern_template 0 create pattern_template_id 1 ingress template
> end flow actions_template 0 create ingress actions_template_id 1 template jump
> group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow
> actions_template 0 create ingress actions_template_id 2 template rss / end mask
> rss / end flow template_table 0 create ingress group 1 table_id 2
> pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0
> template_table 2 pattern_template 0 actions_template 0 postpone false pattern
> end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> 22 23 24 25 26 27 28 29 30 31 end / end flow pull 0 queue 0 # start
> 
> mlnx_perf -i enp33s0f0np0 -t 1:
> 
>       rx_vport_unicast_packets: 151,514,131
>         rx_vport_unicast_bytes: 9,696,904,384 Bps    = 77,575.23 Mbps
>                 rx_packets_phy: 151,514,275
>                   rx_bytes_phy: 9,696,913,600 Bps    = 77,575.30 Mbps
>                rx_64_bytes_phy: 151,514,122 Bps      = 1,212.11 Mbps
>                 rx_prio0_bytes: 9,696,814,528 Bps    = 77,574.51 Mbps
>               rx_prio0_packets: 151,512,717
> 
> Attached: "neohost-cx6dx-jump_rss-hws.txt".
> 
> > - Would you be able to do the test with miss in empty group 1, with Ethernet
> Flow Control disabled?
> 
> $ ethtool -A enp33s0f0np0 rx off tx off
> 
> $ ethtool -a enp33s0f0np0
> Pause parameters for enp33s0f0np0:
> Autonegotiate:  off
> RX:             off
> TX:             off
> 
> testpmd> show port 0 flow_ctrl
> 
> ********************* Flow control infos for port 0  ********************* FC
> mode:
>    Rx pause: off
>    Tx pause: off
> Autoneg: off
> Pause time: 0x0
> High waterline: 0x0
> Low waterline: 0x0
> Send XON: off
> Forward MAC control frames: off
> 
> 
> Case "jump_miss", SWS
> =====================
> Result: 148 Mpps
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end start
> 
> mlnx_perf -i enp33s0f0np0
> 
>       rx_vport_unicast_packets: 151,526,489
>         rx_vport_unicast_bytes: 9,697,695,296 Bps    = 77,581.56 Mbps
>                 rx_packets_phy: 151,526,193
>                   rx_bytes_phy: 9,697,676,672 Bps    = 77,581.41 Mbps
>                rx_64_bytes_phy: 151,525,423 Bps      = 1,212.20 Mbps
>                 rx_prio0_bytes: 9,697,488,256 Bps    = 77,579.90 Mbps
>               rx_prio0_packets: 151,523,240
> 
> Attached: "neohost-cx6dx-jump_miss-sws.txt".
> 
> 
> Case "jump_miss", HWS
> =====================
> Result: 107 Mpps
> Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> port stop 0
> flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> end flow actions_template 0 create ingress actions_template_id 1 template jump
> group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> postpone false pattern end actions jump group 1 / end flow pull 0 queue 0
> 
> mlnx_perf -i enp33s0f0np0
> 
>        rx_steer_missed_packets: 109,463,466
>       rx_vport_unicast_packets: 109,463,450
>         rx_vport_unicast_bytes: 7,005,660,800 Bps    = 56,045.28 Mbps
>                 rx_packets_phy: 151,518,062
>                   rx_bytes_phy: 9,697,155,840 Bps    = 77,577.24 Mbps
>                rx_64_bytes_phy: 151,516,201 Bps      = 1,212.12 Mbps
>                 rx_prio0_bytes: 9,697,137,280 Bps    = 77,577.9 Mbps
>               rx_prio0_packets: 151,517,782
>           rx_prio0_buf_discard: 42,055,156
> 
> Attached: "neohost-cx6dx-jump_miss-hws.txt".

As you can see HWS provides "rx_steer_missed_packets" counter, which is not available with SWS.
It counts the number of packets which did not hit any rule and in the end, they had to be dropped.
To enable that, additional HW flows are required which would handle packets which did not hit any rule.
It has a side effect - these HW flows cause enough backpressure
that on very high packet rate, it causes Rx buffer overflow on CX6 Dx.

After some internal discussions, I learned that it is kind of expected,
because this high number of missed packets is already an indication of the problem -
- NIC resources are wasted on packets for which there is no specified destination.

> Case "jump_drop", SWS
> =====================
> Result: 148 Mpps
> Match all in group 0, jump to group 1; match all in group 1, drop.
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> 0 ingress group 1 pattern end actions drop / end
> 
> mlnx_perf -i enp33s0f0np0
> 
>       rx_vport_unicast_packets: 151,705,269
>         rx_vport_unicast_bytes: 9,709,137,216 Bps    = 77,673.9 Mbps
>                 rx_packets_phy: 151,701,498
>                   rx_bytes_phy: 9,708,896,128 Bps    = 77,671.16 Mbps
>                rx_64_bytes_phy: 151,693,532 Bps      = 1,213.54 Mbps
>                 rx_prio0_bytes: 9,707,005,888 Bps    = 77,656.4 Mbps
>               rx_prio0_packets: 151,671,959
> 
> Attached: "neohost-cx6dx-jump_drop-sws.txt".
> 
> 
> Case "jump_drop", HWS
> =====================
> Result: 107 Mpps
> Match all in group 0, jump to group 1; match all in group 1, drop.
> I've also run this test with a counter attached to the dropping table, and it
> showed that indeed only 107 Mpps hit the rule.
> 
> /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
>         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> 
> port stop 0
> flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> end flow actions_template 0 create ingress actions_template_id 1 template jump
> group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow
> actions_template 0 create ingress actions_template_id 2 template drop / end
> mask drop / end flow template_table 0 create ingress group 1 table_id 2
> pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0
> template_table 2 pattern_template 0 actions_template 0 postpone false pattern
> end actions drop / end flow pull 0 queue 0
> 
> mlnx_perf -i enp33s0f0np0
> 
>       rx_vport_unicast_packets: 109,500,637
>         rx_vport_unicast_bytes: 7,008,040,768 Bps    = 56,064.32 Mbps
>                 rx_packets_phy: 151,568,915
>                   rx_bytes_phy: 9,700,410,560 Bps    = 77,603.28 Mbps
>                rx_64_bytes_phy: 151,569,146 Bps      = 1,212.55 Mbps
>                 rx_prio0_bytes: 9,699,889,216 Bps    = 77,599.11 Mbps
>               rx_prio0_packets: 151,560,756
>           rx_prio0_buf_discard: 42,065,705
> 
> Attached: "neohost-cx6dx-jump_drop-hws.txt".

We're still looking into "jump_drop" case.

By the way - May I ask what is your target use case with HWS?

Best regards,
Dariusz Sosnowski

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net/mlx5] Performance drop with HWS compared to SWS
  2024-06-19 19:15     ` Dariusz Sosnowski
@ 2024-06-20 13:05       ` Dmitry Kozlyuk
  2024-09-27 11:33       ` Dmitry Kozlyuk
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Kozlyuk @ 2024-06-20 13:05 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: users

2024-06-19 19:15 (UTC+0000), Dariusz Sosnowski:
[snip]
> > Case "jump_rss", SWS
> > ====================
> > Jump to group 1, then RSS.
> > Result: 37 Mpps (?!)
> > This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed
> > to overcome.
> > Is MPRQ limited only to default RSS in SWS mode?
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> > 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13
> > 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end # start
> > 
> > mlnx_perf -i enp33s0f0np0 -t 1:
> > 
> >       rx_vport_unicast_packets: 38,155,359
> >         rx_vport_unicast_bytes: 2,441,942,976 Bps    = 19,535.54 Mbps
> >                 tx_packets_phy: 7,586
> >                 rx_packets_phy: 151,531,694
> >                   tx_bytes_phy: 485,568 Bps          = 3.88 Mbps
> >                   rx_bytes_phy: 9,698,029,248 Bps    = 77,584.23 Mbps
> >             tx_mac_control_phy: 7,587
> >              tx_pause_ctrl_phy: 7,587
> >                rx_discards_phy: 113,376,265
> >                rx_64_bytes_phy: 151,531,748 Bps      = 1,212.25 Mbps
> >     rx_buffer_passed_thres_phy: 203
> >                 rx_prio0_bytes: 9,698,066,560 Bps    = 77,584.53 Mbps
> >               rx_prio0_packets: 38,155,328
> >              rx_prio0_discards: 113,376,963
> >                tx_global_pause: 7,587
> >       tx_global_pause_duration: 1,018,266
> > 
> > Attached: "neohost-cx6dx-jump_rss-sws.txt".  
> 
> How are you generating the traffic? Are both IP addresses and TCP ports changing?
> 
> "jump_rss" case degradation seems to be caused by RSS configuration.
> It appears that packets are not distributed across all queues.
> With these flow commands in SWS all packets should go to queue 0 only.
> Could you please check if that's the case on your side?
> 
> It can be alleviated this by specifying RSS hash types on RSS action:
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end
> flow create 0 ingress group 1 pattern end actions rss queues <queues> end types ip tcp end / end
> 
> Could you please try that on your side?
> 
> With HWS flow engine, if RSS action does not have hash types specified,
> implementation defaults to hashing on IP addresses.
> If IP addresses are variable in your test traffic, that would explain the difference.

You are right, SWS performance is 148 Mpps with "types ipv4-tcp end"!
I was missing the difference between default RSS "types" for SWS and HWS.

For the reference, generated traffic fields:
* Ethernet: fixed unicast MACs
* VLAN: fixed VLAN IDs, zero TCI
* Source IPv4: random from 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
* Destination IPv4: random from a /16 subnet
* Source TCP port: random
* Destination TCP port: random
* TCP flags: SYN

[snip]
> > Case "jump_miss", SWS
> > =====================
> > Result: 148 Mpps
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end start
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 151,526,489
> >         rx_vport_unicast_bytes: 9,697,695,296 Bps    = 77,581.56 Mbps
> >                 rx_packets_phy: 151,526,193
> >                   rx_bytes_phy: 9,697,676,672 Bps    = 77,581.41 Mbps
> >                rx_64_bytes_phy: 151,525,423 Bps      = 1,212.20 Mbps
> >                 rx_prio0_bytes: 9,697,488,256 Bps    = 77,579.90 Mbps
> >               rx_prio0_packets: 151,523,240
> > 
> > Attached: "neohost-cx6dx-jump_miss-sws.txt".
> > 
> > 
> > Case "jump_miss", HWS
> > =====================
> > Result: 107 Mpps
> > Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > port stop 0
> > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> > end flow actions_template 0 create ingress actions_template_id 1 template jump
> > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >        rx_steer_missed_packets: 109,463,466
> >       rx_vport_unicast_packets: 109,463,450
> >         rx_vport_unicast_bytes: 7,005,660,800 Bps    = 56,045.28 Mbps
> >                 rx_packets_phy: 151,518,062
> >                   rx_bytes_phy: 9,697,155,840 Bps    = 77,577.24 Mbps
> >                rx_64_bytes_phy: 151,516,201 Bps      = 1,212.12 Mbps
> >                 rx_prio0_bytes: 9,697,137,280 Bps    = 77,577.9 Mbps
> >               rx_prio0_packets: 151,517,782
> >           rx_prio0_buf_discard: 42,055,156
> > 
> > Attached: "neohost-cx6dx-jump_miss-hws.txt".  
> 
> As you can see HWS provides "rx_steer_missed_packets" counter, which is not available with SWS.
> It counts the number of packets which did not hit any rule and in the end, they had to be dropped.
> To enable that, additional HW flows are required which would handle packets which did not hit any rule.
> It has a side effect - these HW flows cause enough backpressure
> that on very high packet rate, it causes Rx buffer overflow on CX6 Dx.
> 
> After some internal discussions, I learned that it is kind of expected,
> because this high number of missed packets is already an indication of the problem -
> - NIC resources are wasted on packets for which there is no specified destination.

Fair enough. In production, there will be no misses and thus no issue.
This case is mostly to exhaust the search space.

> 
> > Case "jump_drop", SWS
> > =====================
> > Result: 148 Mpps
> > Match all in group 0, jump to group 1; match all in group 1, drop.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create
> > 0 ingress group 1 pattern end actions drop / end
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 151,705,269
> >         rx_vport_unicast_bytes: 9,709,137,216 Bps    = 77,673.9 Mbps
> >                 rx_packets_phy: 151,701,498
> >                   rx_bytes_phy: 9,708,896,128 Bps    = 77,671.16 Mbps
> >                rx_64_bytes_phy: 151,693,532 Bps      = 1,213.54 Mbps
> >                 rx_prio0_bytes: 9,707,005,888 Bps    = 77,656.4 Mbps
> >               rx_prio0_packets: 151,671,959
> > 
> > Attached: "neohost-cx6dx-jump_drop-sws.txt".
> > 
> > 
> > Case "jump_drop", HWS
> > =====================
> > Result: 107 Mpps
> > Match all in group 0, jump to group 1; match all in group 1, drop.
> > I've also run this test with a counter attached to the dropping table, and it
> > showed that indeed only 107 Mpps hit the rule.
> > 
> > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a
> > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
> >         -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
> > 
> > port stop 0
> > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> > end flow actions_template 0 create ingress actions_template_id 1 template jump
> > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow
> > actions_template 0 create ingress actions_template_id 2 template drop / end
> > mask drop / end flow template_table 0 create ingress group 1 table_id 2
> > pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0
> > template_table 2 pattern_template 0 actions_template 0 postpone false pattern
> > end actions drop / end flow pull 0 queue 0
> > 
> > mlnx_perf -i enp33s0f0np0
> > 
> >       rx_vport_unicast_packets: 109,500,637
> >         rx_vport_unicast_bytes: 7,008,040,768 Bps    = 56,064.32 Mbps
> >                 rx_packets_phy: 151,568,915
> >                   rx_bytes_phy: 9,700,410,560 Bps    = 77,603.28 Mbps
> >                rx_64_bytes_phy: 151,569,146 Bps      = 1,212.55 Mbps
> >                 rx_prio0_bytes: 9,699,889,216 Bps    = 77,599.11 Mbps
> >               rx_prio0_packets: 151,560,756
> >           rx_prio0_buf_discard: 42,065,705
> > 
> > Attached: "neohost-cx6dx-jump_drop-hws.txt".  
> 
> We're still looking into "jump_drop" case.

Looking forward for your verdict.
Thank you for investigating and explaining, understanding helps a lot.

> By the way - May I ask what is your target use case with HWS?

The subject area is DDoS attack protection, this is why I'm interested in 64B
packets which are not so frequent in normal traffic, at wire speed.
Drop action performance is crucial, because it is to be used
for blocking malicious traffic that comes at the highest packet rates.
IPv4 is the primary target, IPv6 is nice to have.

Hopefully HWS relaxed matching will allow matching by 5-tuple at max pps.
SWS proved to be incapable of that in groups other than 0
because of extra hops, and group 0 has other limitations.
Typical flow structures expected:

- match by 5-tuple, drop on match, RSS to SW otherwise;
- match by 5-tuple, mark and RSS to SW on match, just RSS to SW otherwise;
- match by 5-tuple, modify MAC and/or VLAN, RSS to 1-port hairpin queues;
- match by 5-tuple, meter (?), RSS to SW or 2-port hairpin queues.

These 5-tuples come in bulk from packet inspection on data path,
so high insertion speed of HWS is very desirable.

Worst case scale: millions of flow rules, ~100K insertions/second,
but HWS may prove itself useful to us even at much smaller scale.
I've already noticed that the cost of table miss with HWS
depends on the table size even if the table is empty
(going from 1024 to 2048 adds 0.5 hops),
so maybe very large tables are not realistic at max pps.

I'm also exploring other features and their performance in pps.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net/mlx5] Performance drop with HWS compared to SWS
  2024-06-19 19:15     ` Dariusz Sosnowski
  2024-06-20 13:05       ` Dmitry Kozlyuk
@ 2024-09-27 11:33       ` Dmitry Kozlyuk
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Kozlyuk @ 2024-09-27 11:33 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: users

2024-06-19 19:15 (UTC+0000), Dariusz Sosnowski:
[...]
> We're still looking into "jump_drop" case.

Hi Dariusz, any update?

P.S. Thanks for the talk at DPDK Summit, looking forward for more like it.
Debugging HW features is tricky and shedding light is much appreciated.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-27 11:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-13  9:01 [net/mlx5] Performance drop with HWS compared to SWS Dmitry Kozlyuk
2024-06-13 15:06 ` Dariusz Sosnowski
2024-06-13 20:14   ` Dmitry Kozlyuk
2024-06-19 19:15     ` Dariusz Sosnowski
2024-06-20 13:05       ` Dmitry Kozlyuk
2024-09-27 11:33       ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).