From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from serv108.segi.ulg.ac.be (serv108.segi.ulg.ac.be [139.165.32.111]) by dpdk.org (Postfix) with ESMTP id 79100235 for ; Mon, 23 Jul 2018 13:14:05 +0200 (CEST) Received: from mbx12-zne.ulg.ac.be (serv470.segi.ulg.ac.be [139.165.32.199]) by serv108.segi.ulg.ac.be (Postfix) with ESMTP id 5A43E2018230; Mon, 23 Jul 2018 13:14:03 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by mbx12-zne.ulg.ac.be (Postfix) with ESMTP id 4EF96129E69C; Mon, 23 Jul 2018 13:14:03 +0200 (CEST) Received: from mbx12-zne.ulg.ac.be ([127.0.0.1]) by localhost (mbx12-zne.ulg.ac.be [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id iAxp8NVDz2fs; Mon, 23 Jul 2018 13:14:03 +0200 (CEST) Received: from mbx12-zne.ulg.ac.be (mbx12-zne.ulg.ac.be [139.165.32.199]) by mbx12-zne.ulg.ac.be (Postfix) with ESMTP id 30B4A129E65C; Mon, 23 Jul 2018 13:14:03 +0200 (CEST) Date: Mon, 23 Jul 2018 13:14:02 +0200 (CEST) From: tom.barbette@uliege.be To: Shahaf Shuler Cc: users@dpdk.org, katsikas@kth.se, Erez Ferber Message-ID: <1387619701.57645808.1532344442688.JavaMail.zimbra@uliege.be> In-Reply-To: References: <1538126257.56098254.1531928471359.JavaMail.zimbra@uliege.be> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [130.237.202.15] X-Mailer: Zimbra 8.7.1_GA_1670 (ZimbraWebClient - GC67 (Linux)/8.7.1_GA_1670) Thread-Topic: Mlx4/5 : Packets lost between phy and good counters Thread-Index: t/M6JVPtpPednY+4IGKBQcsR0QsNhcHOqvUwx2nHjVE= Subject: Re: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jul 2018 11:14:05 -0000 Hi Shahaf, Thank you for the help ! I did not notice ethtool showed more stats, indeed it would be great to hav= e them in DPDK. As you suggested, rx_discards_phy is increasing so packets = are dropped there. However, it is not due to a lack of buffer (if you meant queues/ring buffer= as opposed to some mellanox internals) as the CPU is starving for work on = any queues. We also ensured the CPU was not the problem by 1) using more CP= U cores, 2) introducing on-purpose instructions and cache misses on the CPU= , that did not lead to any performance loss. 1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged bo= th by lspci and Mlx5 driver as it. 2) Disabling/enabling scatter mode in ethtool does not change performances,= but I don't think we're using it anyway (we do nothing special in DPDK for= this ? Packets are always one segment) 3) We followed the performance guide(s) among other things, with the except= ion of CQE_COMPRESSION as we didn't find any "mst" reference. We noticed that when using only one side of a port, that is one machine onl= y doing TX, and the other doing RX (discarding packets, but still rewriting= them), we do send/receive 100G (the numbers discussed before lead to a ~80= G "bouncing" throughput cap). This is still true with Connect-X 4 or 5, and with different (Intel) machin= es with different motherboards. Maybe the mlx5 perform slightly better (bou= ncing 84G) but there is still this cap, and it may be due to other paramete= rs. Interestingly, we found that this cap is somehow dependent on the card and = not the port, as if we use the two ports of the PCIe card, forwarding from = A to B and B to A at full speed, the throughput goes down to ~40G per port = (so 80G total forwarding throughput), but if we use two different PCI expre= ss card, it is back to ~80G per side, so ~160G forwarding rate total (also = leading to the conclusion that our problem is not CPU-based as with more PC= Ie card we have better perfs). Thanks, Tom ----- Mail original ----- > De: "Shahaf Shuler" > =C3=80: "tom barbette" , users@dpdk.org > Cc: katsikas@kth.se, "Erez Ferber" > Envoy=C3=A9: Dimanche 22 Juillet 2018 07:14:05 > Objet: RE: Mlx4/5 : Packets lost between phy and good counters > Hi Tom, >=20 > Wednesday, July 18, 2018 6:41 PM, tom.barbette@uliege.be: >> Cc: katsikas@kth.se >> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good counter= s >>=20 >> Hi all, >>=20 >> During a simple forwarding experiment using mlx4 (but we observed the >> same with mlx5) 100G NICs, we have a sender reporting more TX throughput >> than what the receiver is receiving, but the receiver does not report an= y >> packet loss... They are connected by a simple QSFP28 direct attach cable= . So >> where did the packet disappear? >>=20 >> The only thing we could find is that rx_good_packets in xstats is lower = than >> rx_packets_phy. rx_packets_phy is in line with what the sender is report= ing, >> so I guess some of the "phy" are not "good". But no error counter, misse= d, >> mbuf_alloc, ... is giving as a clue why those packets are not "good". >>=20 >> We tried with real traces and UDP crafted packets of various size, same >> problem. >>=20 >> Any idea ? >=20 > Yes, what you are experiencing is a packet drop due to backpressure from = the > device. >=20 > The rx_good_packets are the good packets (w/o errors) received by the por= t (can > be either PF or VF). > The rx_packets_phy are the packets received by the physical port (this is= the > aggregation of the PF and all of the VFs). > A gap between those means some packet has been lost, or as you said recei= ved w/ > errors. We are indeed missing one counter here which is the rx_discard_p= hy > which counts > The number of received packets dropped due to lack of buffers on a physic= al > port. This work is in progress. >=20 > There is another way to query this counter (and many others) for Mellanox > devices by using linux ethtool: "ethtool -S " (Mellanox devices k= eep > their kernel module). > The statistics in DPDK are shadow of the ethtool ones. You can read more = about > those counters in the community doc[1]. > w/ the ethtool statistics look for the discard counter and validate if it= is > increasing. >=20 > Assuming it does, we need to understand why you have such backpressure. > Things to check: > 1. is the PCI slot for your mlx5 device is indeed by 16x? > 2. are you using scatter mode w/ large max_rx_pkt_len? > 3. have you followed the mlx5 performance tunning guide[2]? >=20 >=20 >>=20 >> Below the detail stats of the receiver (which is a forwarder but it is n= ot of >> importance in this context) : >>=20 >> stats.count: >> 31986429 >> stats.missed: >> 0 >> stats.error: >> 0 >> fd0.xstats: >> rx_good_packets[0] =3D 31986429 >> tx_good_packets[1] =3D 31986429 >> rx_good_bytes[2] =3D 47979639204 >> tx_good_bytes[3] =3D 47851693488 >> rx_missed_errors[4] =3D 0 >> rx_errors[5] =3D 0 >> tx_errors[6] =3D 0 >> rx_mbuf_allocation_errors[7] =3D 0 >> rx_q0packets[8] =3D 4000025 >> rx_q0bytes[9] =3D 6000036068 >> rx_q0errors[10] =3D 0 >> rx_q1packets[11] =3D 4002151 >> rx_q1bytes[12] =3D 6003226500 >> rx_q1errors[13] =3D 0 >> rx_q2packets[14] =3D 3996758 >> rx_q2bytes[15] =3D 5995137000 >> rx_q2errors[16] =3D 0 >> rx_q3packets[17] =3D 3993614 >> rx_q3bytes[18] =3D 5990421000 >> rx_q3errors[19] =3D 0 >> rx_q4packets[20] =3D 3995758 >> rx_q4bytes[21] =3D 5993637000 >> rx_q4errors[22] =3D 0 >> rx_q5packets[23] =3D 3992126 >> rx_q5bytes[24] =3D 5988189000 >> rx_q5errors[25] =3D 0 >> rx_q6packets[26] =3D 4007488 >> rx_q6bytes[27] =3D 6011230568 >> rx_q6errors[28] =3D 0 >> rx_q7packets[29] =3D 3998509 >> rx_q7bytes[30] =3D 5997762068 >> rx_q7errors[31] =3D 0 >> tx_q0packets[32] =3D 4000025 >> tx_q0bytes[33] =3D 5984035968 >> tx_q1packets[34] =3D 4002151 >> tx_q1bytes[35] =3D 5987217896 >> tx_q2packets[36] =3D 3996758 >> tx_q2bytes[37] =3D 5979149968 >> tx_q3packets[38] =3D 3993614 >> tx_q3bytes[39] =3D 5974446544 >> tx_q4packets[40] =3D 3995758 >> tx_q4bytes[41] =3D 5977653968 >> tx_q5packets[42] =3D 3992126 >> tx_q5bytes[43] =3D 5972220496 >> tx_q6packets[44] =3D 4007488 >> tx_q6bytes[45] =3D 5995200616 >> tx_q7packets[46] =3D 3998509 >> tx_q7bytes[47] =3D 5981768032 >> rx_port_unicast_bytes[48] =3D 47851693488 >> rx_port_multicast_bytes[49] =3D 0 >> rx_port_broadcast_bytes[50] =3D 0 >> rx_port_unicast_packets[51] =3D 31986429 >> rx_port_multicast_packets[52] =3D 0 >> rx_port_broadcast_packets[53] =3D 0 >> tx_port_unicast_bytes[54] =3D 47851693488 >> tx_port_multicast_bytes[55] =3D 0 >> tx_port_broadcast_bytes[56] =3D 0 >> tx_port_unicast_packets[57] =3D 31986429 >> tx_port_multicast_packets[58] =3D 0 >> tx_port_broadcast_packets[59] =3D 0 >> rx_wqe_err[60] =3D 0 >> rx_crc_errors_phy[61] =3D 0 >> rx_in_range_len_errors_phy[62] =3D 0 >> rx_symbol_err_phy[63] =3D 0 >> tx_errors_phy[64] =3D 0 >> rx_out_of_buffer[65] =3D 0 >> tx_packets_phy[66] =3D 31986429 >> rx_packets_phy[67] =3D 36243270 >> tx_bytes_phy[68] =3D 47979639204 >> rx_bytes_phy[69] =3D 54364900704 >>=20 >>=20 >> Thanks, >> Tom >=20 > [1] https://community.mellanox.com/docs/DOC-2532 > [2] https://doc.dpdk.org/guides/nics/mlx5.html