From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ADF2C4413C for ; Thu, 13 Jun 2024 11:01:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0EBBF402E4; Thu, 13 Jun 2024 11:01:51 +0200 (CEST) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by mails.dpdk.org (Postfix) with ESMTP id 0EAEE402B4 for ; Thu, 13 Jun 2024 11:01:50 +0200 (CEST) Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-52c7f7fdd24so1152464e87.1 for ; Thu, 13 Jun 2024 02:01:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718269309; x=1718874109; darn=dpdk.org; h=mime-version:message-id:subject:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=b0055imUS0p83CAoPgDilk1XnMFDn0BpnDmkJ/sU3Hw=; b=AmjmnGQwpff7wQaGKjPsjFr4XcBp8t9MRaotLGnDQFAypG2m2blO6/iu91hDYGi30u 7mFjWivfmMTAZB9zm9LIvUnyO9X40HdmxCLGC5O+xNnKgjuyooA69R3YoJ5kCm/avEbx GU4apDayJugpzm1KiVBEReax8MQrUFiibRV30UKNZC6a4JfwyAq/iwB7E03bieM2UM/u fpEmqctlayxcWXqAudNsS7IqRT/EAWJFcFuQFdqyPKoLfKyIeQ+7sxyIRuYJWnOraMj3 beIqiA/yMYgx3FinCm2V7Keap1ad13lHS+pGRLuxaHTOtzWIqy4sIXNNBe6Gxumr8z+I m5FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718269309; x=1718874109; h=mime-version:message-id:subject:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=b0055imUS0p83CAoPgDilk1XnMFDn0BpnDmkJ/sU3Hw=; b=IisKzBWgDBpJ75fRyu05bNUzUWoMuAvMTHa+izwsEc4MSR8elFP19rhdr3dQn5PYl/ j1HZ+f6GIQywBGVDluTc2TKlhhtwIrIp8p8kZUEYby6mzIw0wEPTuu4YPn8i+XFDWCdW M+dIsA13ErRXdnKqMnwp4qRviwr9n0yc+aLxMzwKYtllFbV2MtI2+8mUI/MZrkEnTM0I i0zPGSLPQ6WYAgUaEvc693b+NdE8JWDNW/PgqoVM9luTZ6BONRUb40mwVUnBZTXiXnOq xUYMVweMTUYc6n1IRrIsrfU9Ibyy9S/B79LY9runlJ7unTufx4cCD8S+8qEgavGbwc7b jNbA== X-Gm-Message-State: AOJu0Yw13AlclS4PyVwnxkdBnwEkGwrJ8hkjp3zZgOlx8FYTyqEs9Ds3 arkSriPzZpGyudz2DOO9m8N5tc1OKzu+AloHrpYhCOiY09PLSSANlFMbOw== X-Google-Smtp-Source: AGHT+IEMXmv5ETYb2Nj0IRUj3lL1k54U1DNyrn46lTCmzONkW0u+AxL68ztm/IgbdqrwXTS8esJLtA== X-Received: by 2002:a05:6512:3706:b0:52c:9f50:f543 with SMTP id 2adb3069b0e04-52c9f50f6dcmr1394038e87.13.1718269308362; Thu, 13 Jun 2024 02:01:48 -0700 (PDT) Received: from sovereign (broadband-109-173-110-33.ip.moscow.rt.ru. [109.173.110.33]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-52ca282f55bsm132063e87.102.2024.06.13.02.01.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 02:01:46 -0700 (PDT) Date: Thu, 13 Jun 2024 12:01:45 +0300 From: Dmitry Kozlyuk To: users@dpdk.org Subject: [net/mlx5] Performance drop with HWS compared to SWS Message-ID: <20240613120145.057d4963@sovereign> X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/d7xsp75x163t_Xr_0cZeRwg" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --MP_/d7xsp75x163t_Xr_0cZeRwg Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello, We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets apparently caused by any rule that jumps out of ingress group 0 when using HWS (async API) instead of SWS (sync API). Is it some known issue or temporary limitation? NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16; FW: 22.40.1000 OFED: MLNX_OFED_LINUX-24.01-0.3.3.1 DPDK: v24.03-23-g76cef1af8b TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps. Examples below do only the jump and miss all packets in group 1, but the same is observed when dropping all the packets in group 1. Software steering: /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1 flow create 0 ingress group 0 pattern end actions jump group 1 / end Neohost (from OFED 5.7): ||=========================================================================== ||| Packet Rate || ||--------------------------------------------------------------------------- ||| RX Packet Rate || 148,813,590 [Packets/Seconds] || ||| TX Packet Rate || 0 [Packets/Seconds] || ||=========================================================================== ||| eSwitch || ||--------------------------------------------------------------------------- ||| RX Hops Per Packet || 3.075 [Hops/Packet] || ||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] || ||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] || ||| RX Packet Rate Bottleneck || 262.2723 [MPPS] || (Full Neohost output is attached.) Hardware steering: /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1 port stop 0 flow configure 0 queues_number 1 queues_size 128 counters_number 16 port start 0 flow pattern_template 0 create pattern_template_id 1 ingress template end flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 Neohost: ||=========================================================================== ||| Packet Rate || ||--------------------------------------------------------------------------- ||| RX Packet Rate || 107,498,115 [Packets/Seconds] || ||| TX Packet Rate || 0 [Packets/Seconds] || ||=========================================================================== ||| eSwitch || ||--------------------------------------------------------------------------- ||| RX Hops Per Packet || 4.5503 [Hops/Packet] || ||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] || ||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] || ||| RX Packet Rate Bottleneck || 182.5796 [MPPS] || AFAIU, performance is not constrained by the complexity of the rules. mlnx_perf -i enp33s0f0np0 -t 1: rx_steer_missed_packets: 108,743,272 rx_vport_unicast_packets: 108,743,424 rx_vport_unicast_bytes: 6,959,579,136 Bps = 55,676.63 Mbps tx_packets_phy: 7,537 rx_packets_phy: 150,538,251 tx_bytes_phy: 482,368 Bps = 3.85 Mbps rx_bytes_phy: 9,634,448,128 Bps = 77,075.58 Mbps tx_mac_control_phy: 7,536 tx_pause_ctrl_phy: 7,536 rx_discards_phy: 41,794,740 rx_64_bytes_phy: 150,538,352 Bps = 1,204.30 Mbps rx_buffer_passed_thres_phy: 202 rx_prio0_bytes: 9,634,520,256 Bps = 77,076.16 Mbps rx_prio0_packets: 108,744,322 rx_prio0_discards: 41,795,050 tx_global_pause: 7,537 tx_global_pause_duration: 1,011,592 "rx_discards_phy" is described as follows [1]: The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. However, the adapter certainly *is* able to process 148 Mpps, since it does so with SWS and it can deliver this much to SW (with MPRQ). [1]: https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst --MP_/d7xsp75x163t_Xr_0cZeRwg Content-Type: text/plain Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=neohost-cx6dx-jump-hws.txt ============================================================================================================================================================= || Counter Name || Counter Value ||| Performance Analysis || Analysis Value [Units] || ============================================================================================================================================================= || Level 0 MTT Cache Hit || 0 ||| Bandwidth || || Level 0 MTT Cache Miss || 0 ||--------------------------------------------------------------------------- || Level 1 MTT Cache Hit || 0 ||| RX BandWidth || 55.039 [Gb/s] || || Level 1 MTT Cache Miss || 0 ||| TX BandWidth || 0 [Gb/s] || || Level 0 MPT Cache Hit || 0 ||=========================================================================== || Level 0 MPT Cache Miss || 0 ||| Memory || || Level 1 MPT Cache Hit || 0 ||--------------------------------------------------------------------------- || Level 1 MPT Cache Miss || 0 ||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] || || Indirect Memory Key Access || 0 ||=========================================================================== || ICM Cache Miss || 38 ||| PCIe Bandwidth || || PCIe Internal Back Pressure || 0 ||--------------------------------------------------------------------------- || Outbound Stalled Reads || 0 ||| PCIe Inbound Available BW || 251.3851 [Gb/s] || || Outbound Stalled Writes || 0 ||| PCIe Inbound BW Utilization || 0.0027 [%] || || PCIe Read Stalled due to No Read Engines || 0 ||| PCIe Inbound Used BW || 0.0069 [Gb/s] || || PCIe Read Stalled due to No Completion Buffer || 0 ||| PCIe Outbound Available BW || 251.3851 [Gb/s] || || PCIe Read Stalled due to Ordering || 0 ||| PCIe Outbound BW Utilization || 0.0025 [%] || || RX IPsec Packets || 0 ||| PCIe Outbound Used BW || 0.0062 [Gb/s] || || Back Pressure from RXD to PSA || 0 ||=========================================================================== || Chip Frequency || 429.9925 ||| PCIe Latency || || Back Pressure from RXB Buffer to RXB FIFO || 0 ||--------------------------------------------------------------------------- || Back Pressure from PSA switch to RXT || 0 ||| PCIe Avg Latency || 523 [NS] || || Back Pressure from PSA switch to RXB || 0 ||| PCIe Max Latency || 548 [NS] || || Back Pressure from PSA switch to RXD || 0 ||| PCIe Min Latency || 511 [NS] || || Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115 ||=========================================================================== || Receive WQE Cache Hit || 0 ||| PCIe Unit Internal Latency || || Receive WQE Cache Miss || 0 ||--------------------------------------------------------------------------- || Back Pressure from PCIe to Packet Scatter || 0 ||| PCIe Internal Avg Latency || 4 [NS] || || RX Steering Packets || 107,498,116 ||| PCIe Internal Max Latency || 4 [NS] || || RX Steering Packets Fast Path || 0 ||| PCIe Internal Min Latency || 4 [NS] || || EQ All State Machines Busy || 0 ||=========================================================================== || CQ All State Machines Busy || 0 ||| Packet Rate || || MSI-X All State Machines Busy || 0 ||--------------------------------------------------------------------------- || CQE Compression Sessions || 0 ||| RX Packet Rate || 107,498,115 [Packets/Seconds] || || Compressed CQEs || 0 ||| TX Packet Rate || 0 [Packets/Seconds] || || Compression Session Closed due to EQE || 0 ||=========================================================================== || Compression Session Closed due to Timeout || 0 ||| eSwitch || || Compression Session Closed due to Mismatch || 0 ||--------------------------------------------------------------------------- || Compression Session Closed due to PCIe Idle || 0 ||| RX Hops Per Packet || 4.5503 [Hops/Packet] || || Compression Session Closed due to S2CQE || 0 ||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] || || Compressed CQE Strides || 0 ||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] || || Compression Session Closed due to LRO || 0 ||| RX Packet Rate Bottleneck || 182.5796 [MPPS] || || TX Descriptor Handling Stopped due to Limited State || 0 ||| TX Hops Per Packet || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to Limited VL || 0 ||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to De-schedule || 0 ||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to Work Done || 0 ||| TX Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to E2E Credits || 0 ||=========================================================================== || Line Transmitted Port 1 || 0 || || Line Transmitted Port 2 || 0 || || Line Transmitted Loop Back || 0 || || RX_PSA0 Steering Pipe 0 || 253,168,409 || || RX_PSA0 Steering Pipe 1 || 235,977,321 || || RX_PSA0 Steering Cache Access Pipe 0 || 224,400,319 || || RX_PSA0 Steering Cache Access Pipe 1 || 208,687,547 || || RX_PSA0 Steering Cache Hit Pipe 0 || 224,400,319 || || RX_PSA0 Steering Cache Hit Pipe 1 || 208,687,547 || || RX_PSA0 Steering Cache Miss Pipe 0 || 0 || || RX_PSA0 Steering Cache Miss Pipe 1 || 0 || || RX_PSA1 Steering Pipe 0 || 253,168,409 || || RX_PSA1 Steering Pipe 1 || 235,977,321 || || RX_PSA1 Steering Cache Access Pipe 0 || 224,400,319 || || RX_PSA1 Steering Cache Access Pipe 1 || 208,687,547 || || RX_PSA1 Steering Cache Hit Pipe 0 || 224,400,319 || || RX_PSA1 Steering Cache Hit Pipe 1 || 208,687,547 || || RX_PSA1 Steering Cache Miss Pipe 0 || 0 || || RX_PSA1 Steering Cache Miss Pipe 1 || 0 || || TX_PSA0 Steering Pipe 0 || 0 || || TX_PSA0 Steering Pipe 1 || 0 || || TX_PSA0 Steering Cache Access Pipe 0 || 0 || || TX_PSA0 Steering Cache Access Pipe 1 || 0 || || TX_PSA0 Steering Cache Hit Pipe 0 || 0 || || TX_PSA0 Steering Cache Hit Pipe 1 || 0 || || TX_PSA0 Steering Cache Miss Pipe 0 || 0 || || TX_PSA0 Steering Cache Miss Pipe 1 || 0 || || TX_PSA1 Steering Pipe 0 || 0 || || TX_PSA1 Steering Pipe 1 || 0 || || TX_PSA1 Steering Cache Access Pipe 0 || 0 || || TX_PSA1 Steering Cache Access Pipe 1 || 0 || || TX_PSA1 Steering Cache Hit Pipe 0 || 0 || || TX_PSA1 Steering Cache Hit Pipe 1 || 0 || || TX_PSA1 Steering Cache Miss Pipe 0 || 0 || || TX_PSA1 Steering Cache Miss Pipe 1 || 0 || ================================================================================== --MP_/d7xsp75x163t_Xr_0cZeRwg Content-Type: text/plain Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=neohost-cx6dx-jump-sws.txt ============================================================================================================================================================= || Counter Name || Counter Value ||| Performance Analysis || Analysis Value [Units] || ============================================================================================================================================================= || Level 0 MTT Cache Hit || 0 ||| Bandwidth || || Level 0 MTT Cache Miss || 0 ||--------------------------------------------------------------------------- || Level 1 MTT Cache Hit || 0 ||| RX BandWidth || 76.1926 [Gb/s] || || Level 1 MTT Cache Miss || 0 ||| TX BandWidth || 0 [Gb/s] || || Level 0 MPT Cache Hit || 0 ||=========================================================================== || Level 0 MPT Cache Miss || 0 ||| Memory || || Level 1 MPT Cache Hit || 0 ||--------------------------------------------------------------------------- || Level 1 MPT Cache Miss || 0 ||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] || || Indirect Memory Key Access || 0 ||=========================================================================== || ICM Cache Miss || 38 ||| PCIe Bandwidth || || PCIe Internal Back Pressure || 0 ||--------------------------------------------------------------------------- || Outbound Stalled Reads || 0 ||| PCIe Inbound Available BW || 251.385 [Gb/s] || || Outbound Stalled Writes || 0 ||| PCIe Inbound BW Utilization || 0.0027 [%] || || PCIe Read Stalled due to No Read Engines || 0 ||| PCIe Inbound Used BW || 0.0069 [Gb/s] || || PCIe Read Stalled due to No Completion Buffer || 0 ||| PCIe Outbound Available BW || 251.385 [Gb/s] || || PCIe Read Stalled due to Ordering || 0 ||| PCIe Outbound BW Utilization || 0.0025 [%] || || RX IPsec Packets || 0 ||| PCIe Outbound Used BW || 0.0062 [Gb/s] || || Back Pressure from RXD to PSA || 0 ||=========================================================================== || Chip Frequency || 429.9919 ||| PCIe Latency || || Back Pressure from RXB Buffer to RXB FIFO || 0 ||--------------------------------------------------------------------------- || Back Pressure from PSA switch to RXT || 0 ||| PCIe Avg Latency || 522 [NS] || || Back Pressure from PSA switch to RXB || 0 ||| PCIe Max Latency || 541 [NS] || || Back Pressure from PSA switch to RXD || 0 ||| PCIe Min Latency || 511 [NS] || || Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590 ||=========================================================================== || Receive WQE Cache Hit || 0 ||| PCIe Unit Internal Latency || || Receive WQE Cache Miss || 0 ||--------------------------------------------------------------------------- || Back Pressure from PCIe to Packet Scatter || 0 ||| PCIe Internal Avg Latency || 4 [NS] || || RX Steering Packets || 148,813,592 ||| PCIe Internal Max Latency || 4 [NS] || || RX Steering Packets Fast Path || 0 ||| PCIe Internal Min Latency || 4 [NS] || || EQ All State Machines Busy || 0 ||=========================================================================== || CQ All State Machines Busy || 0 ||| Packet Rate || || MSI-X All State Machines Busy || 0 ||--------------------------------------------------------------------------- || CQE Compression Sessions || 0 ||| RX Packet Rate || 148,813,590 [Packets/Seconds] || || Compressed CQEs || 0 ||| TX Packet Rate || 0 [Packets/Seconds] || || Compression Session Closed due to EQE || 0 ||=========================================================================== || Compression Session Closed due to Timeout || 0 ||| eSwitch || || Compression Session Closed due to Mismatch || 0 ||--------------------------------------------------------------------------- || Compression Session Closed due to PCIe Idle || 0 ||| RX Hops Per Packet || 3.075 [Hops/Packet] || || Compression Session Closed due to S2CQE || 0 ||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] || || Compressed CQE Strides || 0 ||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] || || Compression Session Closed due to LRO || 0 ||| RX Packet Rate Bottleneck || 262.2723 [MPPS] || || TX Descriptor Handling Stopped due to Limited State || 0 ||| TX Hops Per Packet || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to Limited VL || 0 ||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to De-schedule || 0 ||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to Work Done || 0 ||| TX Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to E2E Credits || 0 ||=========================================================================== || Line Transmitted Port 1 || 0 || || Line Transmitted Port 2 || 0 || || Line Transmitted Loop Back || 0 || || RX_PSA0 Steering Pipe 0 || 243,977,877 || || RX_PSA0 Steering Pipe 1 || 213,617,683 || || RX_PSA0 Steering Cache Access Pipe 0 || 203,526,803 || || RX_PSA0 Steering Cache Access Pipe 1 || 177,919,444 || || RX_PSA0 Steering Cache Hit Pipe 0 || 202,742,093 || || RX_PSA0 Steering Cache Hit Pipe 1 || 177,158,314 || || RX_PSA0 Steering Cache Miss Pipe 0 || 161,513 || || RX_PSA0 Steering Cache Miss Pipe 1 || 158,843 || || RX_PSA1 Steering Pipe 0 || 243,977,877 || || RX_PSA1 Steering Pipe 1 || 213,617,683 || || RX_PSA1 Steering Cache Access Pipe 0 || 203,526,803 || || RX_PSA1 Steering Cache Access Pipe 1 || 177,919,444 || || RX_PSA1 Steering Cache Hit Pipe 0 || 202,742,093 || || RX_PSA1 Steering Cache Hit Pipe 1 || 177,158,314 || || RX_PSA1 Steering Cache Miss Pipe 0 || 161,513 || || RX_PSA1 Steering Cache Miss Pipe 1 || 0 || || TX_PSA0 Steering Pipe 0 || 0 || || TX_PSA0 Steering Pipe 1 || 0 || || TX_PSA0 Steering Cache Access Pipe 0 || 0 || || TX_PSA0 Steering Cache Access Pipe 1 || 0 || || TX_PSA0 Steering Cache Hit Pipe 0 || 0 || || TX_PSA0 Steering Cache Hit Pipe 1 || 0 || || TX_PSA0 Steering Cache Miss Pipe 0 || 0 || || TX_PSA0 Steering Cache Miss Pipe 1 || 0 || || TX_PSA1 Steering Pipe 0 || 0 || || TX_PSA1 Steering Pipe 1 || 0 || || TX_PSA1 Steering Cache Access Pipe 0 || 0 || || TX_PSA1 Steering Cache Access Pipe 1 || 0 || || TX_PSA1 Steering Cache Hit Pipe 0 || 0 || || TX_PSA1 Steering Cache Hit Pipe 1 || 0 || || TX_PSA1 Steering Cache Miss Pipe 0 || 0 || || TX_PSA1 Steering Cache Miss Pipe 1 || 0 || ================================================================================== --MP_/d7xsp75x163t_Xr_0cZeRwg--