From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B4CB2A00C4; Wed, 28 Sep 2022 15:41:33 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9B3AC4113C; Wed, 28 Sep 2022 15:41:33 +0200 (CEST) Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178]) by mails.dpdk.org (Postfix) with ESMTP id DA818410FA for ; Wed, 28 Sep 2022 15:41:31 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id C1DD0A00C5; Wed, 28 Sep 2022 15:41:31 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Subject: [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD) Date: Wed, 28 Sep 2022 13:41:31 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: ethdev X-Bugzilla-Version: 21.11 X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: anton@vaa.su X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org https://bugs.dpdk.org/show_bug.cgi?id=3D1086 Bug ID: 1086 Summary: Significant TX packet drops with Mellanox NIC (mlx5 PMD) Product: DPDK Version: 21.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: anton@vaa.su Target Milestone: --- Created attachment 222 --> https://bugs.dpdk.org/attachment.cgi?id=3D222&action=3Dedit testpmd-fec28ca0e3.log.txt Given 2 servers with 25G Mellanox 2-port NICs: # dpdk-devbind.py -s Network devices using kernel driver =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 0000:3b:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=3Dens1f0np0 drv=3Dmlx= 5_core unused=3Dvfio-pci=20 0000:3b:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=3Dens1f1np1 drv=3Dmlx= 5_core unused=3Dvfio-pci Servers are connected directly. The first server is used as a packet generator, running TRex v2.99 in state= less mode: ./t-rex-64 -c 16 -i ./trex-console trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps The second one runs dpdk-testpmd: OS: Debian GNU/Linux 10 (buster) uname -r: 4.19.0-21-amd64 ofed_info: MLNX_OFED_LINUX-5.7-1.0.2.0 gcc version 8.3.0 (Debian 8.3.0-6) When compiled DPDK v21.08 and running testpmd this way: dpdk-testpmd -l 1-17 -n 4 --log-level=3Ddebug -- --nb-ports=3D2 --nb-cores= =3D16 --portmask=3D0x3 --rxq=3D8 --txq=3D8 It handles roughly 17Mpps per port: trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps TRex Port Statistics port | 0 | 1 | total=20=20=20= =20=20=20=20 -----------+-------------------+-------------------+------------------ owner | root | root |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 link | UP | UP |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 state | TRANSMITTING | TRANSMITTING |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 speed | 25 Gb/s | 25 Gb/s |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 CPU util. | 27.76% | 27.76% |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 -- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 Tx bps L2 | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps=20 Tx bps L1 | 11.42 Gbps | 11.46 Gbps | 22.88 Gbps=20 Tx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps=20 Line Util. | 45.7 % | 45.83 % |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 --- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 Rx bps | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps=20 Rx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps=20 ---- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 opackets | 290928398 | 291050836 | 581979234=20 ipackets | 290885740 | 291093159 | 581978899=20 obytes | 18619417472 | 18627254464 | 37246671936=20 ibytes | 18616688080 | 18629962836 | 37246650916=20 tx-pkts | 290.93 Mpkts | 291.05 Mpkts | 581.98 Mpkts=20 rx-pkts | 290.89 Mpkts | 291.09 Mpkts | 581.98 Mpkts=20 tx-bytes | 18.62 GB | 18.63 GB | 37.25 GB=20 rx-bytes | 18.62 GB | 18.63 GB | 37.25 GB=20 ----- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 oerrors | 0 | 0 | 0=20 ierrors | 0 | 0 | 0 But if we switch to DPDK v21.11, it becomes much worse: TRex Port Statistics port | 0 | 1 | total=20=20=20= =20=20=20=20 -----------+-------------------+-------------------+------------------ owner | root | root |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 link | UP | UP |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 state | TRANSMITTING | TRANSMITTING |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 speed | 25 Gb/s | 25 Gb/s |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 CPU util. | 26.06% | 26.06% |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 -- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 Tx bps L2 | 8.7 Gbps | 8.72 Gbps | 17.42 Gbps=20 Tx bps L1 | 11.42 Gbps | 11.45 Gbps | 22.86 Gbps=20 Tx pps | 16.99 Mpps | 17.04 Mpps | 34.02 Mpps=20 Line Util. | 45.66 % | 45.79 % |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 --- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 Rx bps | 3.75 Gbps | 3.76 Gbps | 7.5 Gbps=20 Rx pps | 7.32 Mpps | 7.34 Mpps | 14.66 Mpps=20 ---- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 opackets | 190538147 | 190707494 | 381245641=20 ipackets | 82174700 | 82260152 | 164434852=20 obytes | 12194441408 | 12205280936 | 24399722344=20 ibytes | 5259181520 | 5264649728 | 10523831248=20 tx-pkts | 190.54 Mpkts | 190.71 Mpkts | 381.25 Mpkts=20 rx-pkts | 82.17 Mpkts | 82.26 Mpkts | 164.43 Mpkts=20 tx-bytes | 12.19 GB | 12.21 GB | 24.4 GB=20 rx-bytes | 5.26 GB | 5.26 GB | 10.52 GB=20 ----- | | |=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20 oerrors | 0 | 0 | 0=20 ierrors | 0 | 0 | 0 It handles only ~7 Mpps for each port, instead of ~17 Mpps! There are huge = TX drops stats reported by testpmd: ---------------------- Forward statistics for port 0 -------------------= --- RX-packets: 1101378001 RX-dropped: 0 RX-total: 1101378001 TX-packets: 1016776861 TX-dropped: 84576754 TX-total: 1101353615 -------------------------------------------------------------------------= --- ---------------------- Forward statistics for port 1 -------------------= --- RX-packets: 1101353615 RX-dropped: 0 RX-total: 1101353615 TX-packets: 1016804108 TX-dropped: 84573893 TX-total: 1101378001 -------------------------------------------------------------------------= --- +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++= ++ RX-packets: 2202731616 RX-dropped: 0 RX-total: 2202731616 TX-packets: 2033580969 TX-dropped: 169150647 TX-total: 2202731616 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= +++ I found the commit (between 21.08 and 21.11), which caused this trouble usi= ng git bisect: https://github.com/DPDK/dpdk/commit/fec28ca0e3a93143829f3b41a28a8da933f28499 Also, I've used to profile it with Intel VTune 2021.3.0 (-collect hotspots & -collect memory-access). I've compared two revisions: 1. 690b2a88c2 (GOOD) 2. fec28ca0e3 (BAD) I may try to share corresponding profiling results somehow if it helps. Unfortunately, I cannot attach them here (vtune stats data is too big). --=20 You are receiving this mail because: You are the assignee for the bug.=