DPDK patches and discussions
 help / color / mirror / Atom feed
* [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD)
@ 2022-09-28 13:41 bugzilla
  0 siblings, 0 replies; only message in thread
From: bugzilla @ 2022-09-28 13:41 UTC (permalink / raw)
  To: dev

https://bugs.dpdk.org/show_bug.cgi?id=1086

            Bug ID: 1086
           Summary: Significant TX packet drops with Mellanox NIC (mlx5
                    PMD)
           Product: DPDK
           Version: 21.11
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: critical
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: anton@vaa.su
  Target Milestone: ---

Created attachment 222
  --> https://bugs.dpdk.org/attachment.cgi?id=222&action=edit
testpmd-fec28ca0e3.log.txt

Given 2 servers with 25G Mellanox 2-port NICs:

# dpdk-devbind.py -s
Network devices using kernel driver
===================================
0000:3b:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f0np0 drv=mlx5_core
unused=vfio-pci 
0000:3b:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f1np1 drv=mlx5_core
unused=vfio-pci

Servers are connected directly.


The first server is used as a packet generator, running TRex v2.99 in stateless
mode:
./t-rex-64 -c 16 -i
./trex-console
trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps


The second one runs dpdk-testpmd:
OS: Debian GNU/Linux 10 (buster)
uname -r: 4.19.0-21-amd64
ofed_info: MLNX_OFED_LINUX-5.7-1.0.2.0
gcc version 8.3.0 (Debian 8.3.0-6)

When compiled DPDK v21.08 and running testpmd this way:

dpdk-testpmd -l 1-17 -n 4 --log-level=debug -- --nb-ports=2 --nb-cores=16
--portmask=0x3 --rxq=8 --txq=8

It handles roughly 17Mpps per port:

trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps

TRex Port Statistics
   port    |         0         |         1         |       total       
-----------+-------------------+-------------------+------------------
owner      |              root |              root |                   
link       |                UP |                UP |                   
state      |      TRANSMITTING |      TRANSMITTING |                   
speed      |           25 Gb/s |           25 Gb/s |                   
CPU util.  |            27.76% |            27.76% |                   
--         |                   |                   |                   
Tx bps L2  |          8.7 Gbps |         8.73 Gbps |        17.43 Gbps 
Tx bps L1  |        11.42 Gbps |        11.46 Gbps |        22.88 Gbps 
Tx pps     |           17 Mpps |        17.05 Mpps |        34.05 Mpps 
Line Util. |            45.7 % |           45.83 % |                   
---        |                   |                   |                   
Rx bps     |          8.7 Gbps |         8.73 Gbps |        17.43 Gbps 
Rx pps     |           17 Mpps |        17.05 Mpps |        34.05 Mpps 
----       |                   |                   |                   
opackets   |         290928398 |         291050836 |         581979234 
ipackets   |         290885740 |         291093159 |         581978899 
obytes     |       18619417472 |       18627254464 |       37246671936 
ibytes     |       18616688080 |       18629962836 |       37246650916 
tx-pkts    |      290.93 Mpkts |      291.05 Mpkts |      581.98 Mpkts 
rx-pkts    |      290.89 Mpkts |      291.09 Mpkts |      581.98 Mpkts 
tx-bytes   |          18.62 GB |          18.63 GB |          37.25 GB 
rx-bytes   |          18.62 GB |          18.63 GB |          37.25 GB 
-----      |                   |                   |                   
oerrors    |                 0 |                 0 |                 0 
ierrors    |                 0 |                 0 |                 0


But if we switch to DPDK v21.11, it becomes much worse:

TRex Port Statistics
   port    |         0         |         1         |       total       
-----------+-------------------+-------------------+------------------
owner      |              root |              root |                   
link       |                UP |                UP |                   
state      |      TRANSMITTING |      TRANSMITTING |                   
speed      |           25 Gb/s |           25 Gb/s |                   
CPU util.  |            26.06% |            26.06% |                   
--         |                   |                   |                   
Tx bps L2  |          8.7 Gbps |         8.72 Gbps |        17.42 Gbps 
Tx bps L1  |        11.42 Gbps |        11.45 Gbps |        22.86 Gbps 
Tx pps     |        16.99 Mpps |        17.04 Mpps |        34.02 Mpps 
Line Util. |           45.66 % |           45.79 % |                   
---        |                   |                   |                   
Rx bps     |         3.75 Gbps |         3.76 Gbps |          7.5 Gbps 
Rx pps     |         7.32 Mpps |         7.34 Mpps |        14.66 Mpps 
----       |                   |                   |                   
opackets   |         190538147 |         190707494 |         381245641 
ipackets   |          82174700 |          82260152 |         164434852 
obytes     |       12194441408 |       12205280936 |       24399722344 
ibytes     |        5259181520 |        5264649728 |       10523831248 
tx-pkts    |      190.54 Mpkts |      190.71 Mpkts |      381.25 Mpkts 
rx-pkts    |       82.17 Mpkts |       82.26 Mpkts |      164.43 Mpkts 
tx-bytes   |          12.19 GB |          12.21 GB |           24.4 GB 
rx-bytes   |           5.26 GB |           5.26 GB |          10.52 GB 
-----      |                   |                   |                   
oerrors    |                 0 |                 0 |                 0 
ierrors    |                 0 |                 0 |                 0

It handles only ~7 Mpps for each port, instead of ~17 Mpps! There are huge TX
drops stats reported by testpmd:
  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 1101378001     RX-dropped: 0             RX-total: 1101378001
  TX-packets: 1016776861     TX-dropped: 84576754      TX-total: 1101353615
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 1101353615     RX-dropped: 0             RX-total: 1101353615
  TX-packets: 1016804108     TX-dropped: 84573893      TX-total: 1101378001
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 2202731616     RX-dropped: 0             RX-total: 2202731616
  TX-packets: 2033580969     TX-dropped: 169150647     TX-total: 2202731616
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


I found the commit (between 21.08 and 21.11), which caused this trouble using
git bisect:
https://github.com/DPDK/dpdk/commit/fec28ca0e3a93143829f3b41a28a8da933f28499

Also, I've used to profile it with Intel VTune 2021.3.0 (-collect hotspots &
-collect memory-access). I've compared two revisions:
1. 690b2a88c2 (GOOD)
2. fec28ca0e3 (BAD)
I may try to share corresponding profiling results somehow if it helps.
Unfortunately, I cannot attach them here (vtune stats data is too big).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-09-28 13:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 13:41 [Bug 1086] Significant TX packet drops with Mellanox NIC (mlx5 PMD) bugzilla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).