Re: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters

DPDK usage discussions
 help / color / mirror / Atom feed

From: tom.barbette@uliege.be
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: users@dpdk.org, katsikas@kth.se, Erez Ferber <erezf@mellanox.com>
Subject: Re: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters
Date: Mon, 23 Jul 2018 13:14:02 +0200 (CEST)	[thread overview]
Message-ID: <1387619701.57645808.1532344442688.JavaMail.zimbra@uliege.be> (raw)
In-Reply-To: <DB7PR05MB4426755FDED765CDDD7F98AFC3570@DB7PR05MB4426.eurprd05.prod.outlook.com>

Hi Shahaf,

Thank you for the help !

I did not notice ethtool showed more stats, indeed it would be great to have them in DPDK. As you suggested, rx_discards_phy is increasing so packets are dropped there.

However, it is not due to a lack of buffer (if you meant queues/ring buffer as opposed to some mellanox internals) as the CPU is starving for work on any queues. We also ensured the CPU was not the problem by 1) using more CPU cores, 2) introducing on-purpose instructions and cache misses on the CPU, that did not lead to any performance loss.

1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged both by lspci and Mlx5 driver as it.
2) Disabling/enabling scatter mode in ethtool does not change performances, but I don't think we're using it anyway (we do nothing special in DPDK for this ? Packets are always one segment)
3) We followed the performance guide(s) among other things, with the exception of CQE_COMPRESSION as we didn't find any "mst" reference.

We noticed that when using only one side of a port, that is one machine only doing TX, and the other doing RX (discarding packets, but still rewriting them), we do send/receive 100G (the numbers discussed before lead to a ~80G "bouncing" throughput cap).

This is still true with Connect-X 4 or 5, and with different (Intel) machines with different motherboards. Maybe the mlx5 perform slightly better (bouncing 84G) but there is still this cap, and it may be due to other parameters.

Interestingly, we found that this cap is somehow dependent on the card and not the port, as if we use the two ports of the PCIe card, forwarding from A to B and B to A at full speed, the throughput goes down to ~40G per port (so 80G total forwarding throughput), but if we use two different PCI express card, it is back to ~80G per side, so ~160G forwarding rate total (also leading to the conclusion that our problem is not CPU-based as with more PCIe card we have better perfs).

Thanks,


Tom

----- Mail original -----
> De: "Shahaf Shuler" <shahafs@mellanox.com>
> À: "tom barbette" <tom.barbette@uliege.be>, users@dpdk.org
> Cc: katsikas@kth.se, "Erez Ferber" <erezf@mellanox.com>
> Envoyé: Dimanche 22 Juillet 2018 07:14:05
> Objet: RE: Mlx4/5 : Packets lost between phy and good counters

> Hi Tom,
> 
> Wednesday, July 18, 2018 6:41 PM, tom.barbette@uliege.be:
>> Cc: katsikas@kth.se
>> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters
>> 
>> Hi all,
>> 
>> During a simple forwarding experiment using mlx4 (but we observed the
>> same with mlx5) 100G NICs, we have a sender reporting more TX throughput
>> than what the receiver is receiving, but the receiver does not report any
>> packet loss... They are connected by a simple QSFP28 direct attach cable. So
>> where did the packet disappear?
>> 
>> The only thing we could find is that rx_good_packets in xstats is lower than
>> rx_packets_phy. rx_packets_phy is in line with what the sender is reporting,
>> so I guess some of the "phy" are not "good". But no error counter, missed,
>> mbuf_alloc, ... is giving as a clue why those packets are not "good".
>> 
>> We tried with real traces and UDP crafted packets of various size, same
>> problem.
>> 
>> Any idea ?
> 
> Yes, what you are experiencing is a packet drop due to backpressure from the
> device.
> 
> The rx_good_packets are the good packets (w/o errors) received by the port (can
> be either PF or VF).
> The rx_packets_phy are the packets received by the physical port (this is the
> aggregation of the PF and all of the VFs).
> A gap between those means some packet has been lost, or as you said received w/
> errors.  We are indeed missing one counter here which is the rx_discard_phy
> which counts
> The number of received packets dropped due to lack of buffers on a physical
> port. This work is in progress.
> 
> There is another way to query this counter (and many others) for Mellanox
> devices by using linux ethtool: "ethtool -S <ifname>" (Mellanox devices keep
> their kernel module).
> The statistics in DPDK are shadow of the ethtool ones. You can read more about
> those counters in the community doc[1].
> w/ the ethtool statistics look for the discard counter and validate if it is
> increasing.
> 
> Assuming it does, we need to understand why you have such backpressure.
> Things to check:
> 1. is the PCI slot for your mlx5 device is indeed by 16x?
> 2. are you using scatter mode w/ large max_rx_pkt_len?
> 3. have you followed the mlx5 performance tunning guide[2]?
> 
> 
>> 
>> Below the detail stats of the receiver (which is a forwarder but it is not of
>> importance in this context) :
>> 
>> stats.count:
>> 31986429
>> stats.missed:
>> 0
>> stats.error:
>> 0
>> fd0.xstats:
>> rx_good_packets[0] = 31986429
>> tx_good_packets[1] = 31986429
>> rx_good_bytes[2] = 47979639204
>> tx_good_bytes[3] = 47851693488
>> rx_missed_errors[4] = 0
>> rx_errors[5] = 0
>> tx_errors[6] = 0
>> rx_mbuf_allocation_errors[7] = 0
>> rx_q0packets[8] = 4000025
>> rx_q0bytes[9] = 6000036068
>> rx_q0errors[10] = 0
>> rx_q1packets[11] = 4002151
>> rx_q1bytes[12] = 6003226500
>> rx_q1errors[13] = 0
>> rx_q2packets[14] = 3996758
>> rx_q2bytes[15] = 5995137000
>> rx_q2errors[16] = 0
>> rx_q3packets[17] = 3993614
>> rx_q3bytes[18] = 5990421000
>> rx_q3errors[19] = 0
>> rx_q4packets[20] = 3995758
>> rx_q4bytes[21] = 5993637000
>> rx_q4errors[22] = 0
>> rx_q5packets[23] = 3992126
>> rx_q5bytes[24] = 5988189000
>> rx_q5errors[25] = 0
>> rx_q6packets[26] = 4007488
>> rx_q6bytes[27] = 6011230568
>> rx_q6errors[28] = 0
>> rx_q7packets[29] = 3998509
>> rx_q7bytes[30] = 5997762068
>> rx_q7errors[31] = 0
>> tx_q0packets[32] = 4000025
>> tx_q0bytes[33] = 5984035968
>> tx_q1packets[34] = 4002151
>> tx_q1bytes[35] = 5987217896
>> tx_q2packets[36] = 3996758
>> tx_q2bytes[37] = 5979149968
>> tx_q3packets[38] = 3993614
>> tx_q3bytes[39] = 5974446544
>> tx_q4packets[40] = 3995758
>> tx_q4bytes[41] = 5977653968
>> tx_q5packets[42] = 3992126
>> tx_q5bytes[43] = 5972220496
>> tx_q6packets[44] = 4007488
>> tx_q6bytes[45] = 5995200616
>> tx_q7packets[46] = 3998509
>> tx_q7bytes[47] = 5981768032
>> rx_port_unicast_bytes[48] = 47851693488
>> rx_port_multicast_bytes[49] = 0
>> rx_port_broadcast_bytes[50] = 0
>> rx_port_unicast_packets[51] = 31986429
>> rx_port_multicast_packets[52] = 0
>> rx_port_broadcast_packets[53] = 0
>> tx_port_unicast_bytes[54] = 47851693488
>> tx_port_multicast_bytes[55] = 0
>> tx_port_broadcast_bytes[56] = 0
>> tx_port_unicast_packets[57] = 31986429
>> tx_port_multicast_packets[58] = 0
>> tx_port_broadcast_packets[59] = 0
>> rx_wqe_err[60] = 0
>> rx_crc_errors_phy[61] = 0
>> rx_in_range_len_errors_phy[62] = 0
>> rx_symbol_err_phy[63] = 0
>> tx_errors_phy[64] = 0
>> rx_out_of_buffer[65] = 0
>> tx_packets_phy[66] = 31986429
>> rx_packets_phy[67] = 36243270
>> tx_bytes_phy[68] = 47979639204
>> rx_bytes_phy[69] = 54364900704
>> 
>> 
>> Thanks,
>> Tom
> 
> [1] https://community.mellanox.com/docs/DOC-2532
> [2] https://doc.dpdk.org/guides/nics/mlx5.html

next prev parent reply	other threads:[~2018-07-23 11:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 15:41 tom.barbette
2018-07-22  5:14 ` Shahaf Shuler
2018-07-23 11:14   ` tom.barbette [this message]
2018-07-24  7:33     ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1387619701.57645808.1532344442688.JavaMail.zimbra@uliege.be \
    --to=tom.barbette@uliege.be \
    --cc=erezf@mellanox.com \
    --cc=katsikas@kth.se \
    --cc=shahafs@mellanox.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).