* [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD)
@ 2022-09-28 13:41 bugzilla
0 siblings, 0 replies; only message in thread
From: bugzilla @ 2022-09-28 13:41 UTC (permalink / raw)
To: dev
https://bugs.dpdk.org/show_bug.cgi?id=1086
Bug ID: 1086
Summary: Significant TX packet drops with Mellanox NIC (mlx5
PMD)
Product: DPDK
Version: 21.11
Hardware: x86
OS: Linux
Status: UNCONFIRMED
Severity: critical
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: anton@vaa.su
Target Milestone: ---
Created attachment 222
--> https://bugs.dpdk.org/attachment.cgi?id=222&action=edit
testpmd-fec28ca0e3.log.txt
Given 2 servers with 25G Mellanox 2-port NICs:
# dpdk-devbind.py -s
Network devices using kernel driver
===================================
0000:3b:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f0np0 drv=mlx5_core
unused=vfio-pci
0000:3b:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f1np1 drv=mlx5_core
unused=vfio-pci
Servers are connected directly.
The first server is used as a packet generator, running TRex v2.99 in stateless
mode:
./t-rex-64 -c 16 -i
./trex-console
trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps
The second one runs dpdk-testpmd:
OS: Debian GNU/Linux 10 (buster)
uname -r: 4.19.0-21-amd64
ofed_info: MLNX_OFED_LINUX-5.7-1.0.2.0
gcc version 8.3.0 (Debian 8.3.0-6)
When compiled DPDK v21.08 and running testpmd this way:
dpdk-testpmd -l 1-17 -n 4 --log-level=debug -- --nb-ports=2 --nb-cores=16
--portmask=0x3 --rxq=8 --txq=8
It handles roughly 17Mpps per port:
trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps
TRex Port Statistics
port | 0 | 1 | total
-----------+-------------------+-------------------+------------------
owner | root | root |
link | UP | UP |
state | TRANSMITTING | TRANSMITTING |
speed | 25 Gb/s | 25 Gb/s |
CPU util. | 27.76% | 27.76% |
-- | | |
Tx bps L2 | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps
Tx bps L1 | 11.42 Gbps | 11.46 Gbps | 22.88 Gbps
Tx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps
Line Util. | 45.7 % | 45.83 % |
--- | | |
Rx bps | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps
Rx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps
---- | | |
opackets | 290928398 | 291050836 | 581979234
ipackets | 290885740 | 291093159 | 581978899
obytes | 18619417472 | 18627254464 | 37246671936
ibytes | 18616688080 | 18629962836 | 37246650916
tx-pkts | 290.93 Mpkts | 291.05 Mpkts | 581.98 Mpkts
rx-pkts | 290.89 Mpkts | 291.09 Mpkts | 581.98 Mpkts
tx-bytes | 18.62 GB | 18.63 GB | 37.25 GB
rx-bytes | 18.62 GB | 18.63 GB | 37.25 GB
----- | | |
oerrors | 0 | 0 | 0
ierrors | 0 | 0 | 0
But if we switch to DPDK v21.11, it becomes much worse:
TRex Port Statistics
port | 0 | 1 | total
-----------+-------------------+-------------------+------------------
owner | root | root |
link | UP | UP |
state | TRANSMITTING | TRANSMITTING |
speed | 25 Gb/s | 25 Gb/s |
CPU util. | 26.06% | 26.06% |
-- | | |
Tx bps L2 | 8.7 Gbps | 8.72 Gbps | 17.42 Gbps
Tx bps L1 | 11.42 Gbps | 11.45 Gbps | 22.86 Gbps
Tx pps | 16.99 Mpps | 17.04 Mpps | 34.02 Mpps
Line Util. | 45.66 % | 45.79 % |
--- | | |
Rx bps | 3.75 Gbps | 3.76 Gbps | 7.5 Gbps
Rx pps | 7.32 Mpps | 7.34 Mpps | 14.66 Mpps
---- | | |
opackets | 190538147 | 190707494 | 381245641
ipackets | 82174700 | 82260152 | 164434852
obytes | 12194441408 | 12205280936 | 24399722344
ibytes | 5259181520 | 5264649728 | 10523831248
tx-pkts | 190.54 Mpkts | 190.71 Mpkts | 381.25 Mpkts
rx-pkts | 82.17 Mpkts | 82.26 Mpkts | 164.43 Mpkts
tx-bytes | 12.19 GB | 12.21 GB | 24.4 GB
rx-bytes | 5.26 GB | 5.26 GB | 10.52 GB
----- | | |
oerrors | 0 | 0 | 0
ierrors | 0 | 0 | 0
It handles only ~7 Mpps for each port, instead of ~17 Mpps! There are huge TX
drops stats reported by testpmd:
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 1101378001 RX-dropped: 0 RX-total: 1101378001
TX-packets: 1016776861 TX-dropped: 84576754 TX-total: 1101353615
----------------------------------------------------------------------------
---------------------- Forward statistics for port 1 ----------------------
RX-packets: 1101353615 RX-dropped: 0 RX-total: 1101353615
TX-packets: 1016804108 TX-dropped: 84573893 TX-total: 1101378001
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 2202731616 RX-dropped: 0 RX-total: 2202731616
TX-packets: 2033580969 TX-dropped: 169150647 TX-total: 2202731616
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I found the commit (between 21.08 and 21.11), which caused this trouble using
git bisect:
https://github.com/DPDK/dpdk/commit/fec28ca0e3a93143829f3b41a28a8da933f28499
Also, I've used to profile it with Intel VTune 2021.3.0 (-collect hotspots &
-collect memory-access). I've compared two revisions:
1. 690b2a88c2 (GOOD)
2. fec28ca0e3 (BAD)
I may try to share corresponding profiling results somehow if it helps.
Unfortunately, I cannot attach them here (vtune stats data is too big).
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2022-09-28 13:41 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 13:41 [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD) bugzilla
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).