DPDK usage discussions
 help / color / mirror / Atom feed
From: Shahaf Shuler <shahafs@mellanox.com>
To: "tom.barbette@uliege.be" <tom.barbette@uliege.be>
Cc: "users@dpdk.org" <users@dpdk.org>,
	"katsikas@kth.se" <katsikas@kth.se>,
	Erez Ferber <erezf@mellanox.com>
Subject: Re: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters
Date: Tue, 24 Jul 2018 07:33:29 +0000	[thread overview]
Message-ID: <DB7PR05MB44262FDC181E55B6B7B6984AC3550@DB7PR05MB4426.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <1387619701.57645808.1532344442688.JavaMail.zimbra@uliege.be>

Monday, July 23, 2018 2:14 PM, tom.barbette@uliege.be:
> Subject: Re: Mlx4/5 : Packets lost between phy and good counters
> 
> Hi Shahaf,
> 
> Thank you for the help !
> 
> I did not notice ethtool showed more stats, indeed it would be great to have
> them in DPDK. As you suggested, rx_discards_phy is increasing so packets
> are dropped there.
> 
> However, it is not due to a lack of buffer (if you meant queues/ring buffer as
> opposed to some mellanox internals) as the CPU is starving for work on any
> queues. We also ensured the CPU was not the problem by 1) using more
> CPU cores, 2) introducing on-purpose instructions and cache misses on the
> CPU, that did not lead to any performance loss.

I didn't say the backpressure comes from CPU, it is probably triggered by the NIC from some reason (the PCI and scatter check I requested from you were some simple sanity checks for possible reasons).

> 
> 1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged
> both by lspci and Mlx5 driver as it.
> 2) Disabling/enabling scatter mode in ethtool does not change performances,

Not through ethtool, by DPDK APIs. 

> but I don't think we're using it anyway (we do nothing special in DPDK for this
> ? Packets are always one segment)
> 3) We followed the performance guide(s) among other things, with the
> exception of CQE_COMPRESSION as we didn't find any "mst" reference.
> 
> We noticed that when using only one side of a port, that is one machine only
> doing TX, and the other doing RX (discarding packets, but still rewriting
> them), we do send/receive 100G (the numbers discussed before lead to a
> ~80G "bouncing" throughput cap).
> 
> This is still true with Connect-X 4 or 5, and with different (Intel) machines
> with different motherboards. Maybe the mlx5 perform slightly better
> (bouncing 84G) but there is still this cap, and it may be due to other
> parameters.
> 
> Interestingly, we found that this cap is somehow dependent on the card and
> not the port, as if we use the two ports of the PCIe card, forwarding from A
> to B and B to A at full speed, the throughput goes down to ~40G per port (so
> 80G total forwarding throughput), but if we use two different PCI express
> card, it is back to ~80G per side, so ~160G forwarding rate total (also leading
> to the conclusion that our problem is not CPU-based as with more PCIe card
> we have better perfs).

It looks like the bottleneck is on the PCI, and CQE_COMPRESSION configuration can be the reason for that (it is feature to save PCI utilization and critical to reach 100G w/ small frames). 
As this looks like a NIC/System configuration issue I suggest to open a ticket to Mellanox Support in order to look on your system and advise. 

> 
> Thanks,
> 
> 
> Tom
> 
> ----- Mail original -----
> > De: "Shahaf Shuler" <shahafs@mellanox.com>
> > À: "tom barbette" <tom.barbette@uliege.be>, users@dpdk.org
> > Cc: katsikas@kth.se, "Erez Ferber" <erezf@mellanox.com>
> > Envoyé: Dimanche 22 Juillet 2018 07:14:05
> > Objet: RE: Mlx4/5 : Packets lost between phy and good counters
> 
> > Hi Tom,
> >
> > Wednesday, July 18, 2018 6:41 PM, tom.barbette@uliege.be:
> >> Cc: katsikas@kth.se
> >> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good
> >> counters
> >>
> >> Hi all,
> >>
> >> During a simple forwarding experiment using mlx4 (but we observed the
> >> same with mlx5) 100G NICs, we have a sender reporting more TX
> >> throughput than what the receiver is receiving, but the receiver does
> >> not report any packet loss... They are connected by a simple QSFP28
> >> direct attach cable. So where did the packet disappear?
> >>
> >> The only thing we could find is that rx_good_packets in xstats is
> >> lower than rx_packets_phy. rx_packets_phy is in line with what the
> >> sender is reporting, so I guess some of the "phy" are not "good". But
> >> no error counter, missed, mbuf_alloc, ... is giving as a clue why those
> packets are not "good".
> >>
> >> We tried with real traces and UDP crafted packets of various size,
> >> same problem.
> >>
> >> Any idea ?
> >
> > Yes, what you are experiencing is a packet drop due to backpressure
> > from the device.
> >
> > The rx_good_packets are the good packets (w/o errors) received by the
> > port (can be either PF or VF).
> > The rx_packets_phy are the packets received by the physical port (this
> > is the aggregation of the PF and all of the VFs).
> > A gap between those means some packet has been lost, or as you said
> > received w/ errors.  We are indeed missing one counter here which is
> > the rx_discard_phy which counts The number of received packets dropped
> > due to lack of buffers on a physical port. This work is in progress.
> >
> > There is another way to query this counter (and many others) for
> > Mellanox devices by using linux ethtool: "ethtool -S <ifname>"
> > (Mellanox devices keep their kernel module).
> > The statistics in DPDK are shadow of the ethtool ones. You can read
> > more about those counters in the community doc[1].
> > w/ the ethtool statistics look for the discard counter and validate if
> > it is increasing.
> >
> > Assuming it does, we need to understand why you have such
> backpressure.
> > Things to check:
> > 1. is the PCI slot for your mlx5 device is indeed by 16x?
> > 2. are you using scatter mode w/ large max_rx_pkt_len?
> > 3. have you followed the mlx5 performance tunning guide[2]?
> >
> >
> >>
> >> Below the detail stats of the receiver (which is a forwarder but it
> >> is not of importance in this context) :
> >>
> >> stats.count:
> >> 31986429
> >> stats.missed:
> >> 0
> >> stats.error:
> >> 0
> >> fd0.xstats:
> >> rx_good_packets[0] = 31986429
> >> tx_good_packets[1] = 31986429
> >> rx_good_bytes[2] = 47979639204
> >> tx_good_bytes[3] = 47851693488
> >> rx_missed_errors[4] = 0
> >> rx_errors[5] = 0
> >> tx_errors[6] = 0
> >> rx_mbuf_allocation_errors[7] = 0
> >> rx_q0packets[8] = 4000025
> >> rx_q0bytes[9] = 6000036068
> >> rx_q0errors[10] = 0
> >> rx_q1packets[11] = 4002151
> >> rx_q1bytes[12] = 6003226500
> >> rx_q1errors[13] = 0
> >> rx_q2packets[14] = 3996758
> >> rx_q2bytes[15] = 5995137000
> >> rx_q2errors[16] = 0
> >> rx_q3packets[17] = 3993614
> >> rx_q3bytes[18] = 5990421000
> >> rx_q3errors[19] = 0
> >> rx_q4packets[20] = 3995758
> >> rx_q4bytes[21] = 5993637000
> >> rx_q4errors[22] = 0
> >> rx_q5packets[23] = 3992126
> >> rx_q5bytes[24] = 5988189000
> >> rx_q5errors[25] = 0
> >> rx_q6packets[26] = 4007488
> >> rx_q6bytes[27] = 6011230568
> >> rx_q6errors[28] = 0
> >> rx_q7packets[29] = 3998509
> >> rx_q7bytes[30] = 5997762068
> >> rx_q7errors[31] = 0
> >> tx_q0packets[32] = 4000025
> >> tx_q0bytes[33] = 5984035968
> >> tx_q1packets[34] = 4002151
> >> tx_q1bytes[35] = 5987217896
> >> tx_q2packets[36] = 3996758
> >> tx_q2bytes[37] = 5979149968
> >> tx_q3packets[38] = 3993614
> >> tx_q3bytes[39] = 5974446544
> >> tx_q4packets[40] = 3995758
> >> tx_q4bytes[41] = 5977653968
> >> tx_q5packets[42] = 3992126
> >> tx_q5bytes[43] = 5972220496
> >> tx_q6packets[44] = 4007488
> >> tx_q6bytes[45] = 5995200616
> >> tx_q7packets[46] = 3998509
> >> tx_q7bytes[47] = 5981768032
> >> rx_port_unicast_bytes[48] = 47851693488 rx_port_multicast_bytes[49] =
> >> 0 rx_port_broadcast_bytes[50] = 0 rx_port_unicast_packets[51] =
> >> 31986429 rx_port_multicast_packets[52] = 0
> >> rx_port_broadcast_packets[53] = 0 tx_port_unicast_bytes[54] =
> >> 47851693488 tx_port_multicast_bytes[55] = 0
> >> tx_port_broadcast_bytes[56] = 0 tx_port_unicast_packets[57] =
> >> 31986429 tx_port_multicast_packets[58] = 0
> >> tx_port_broadcast_packets[59] = 0 rx_wqe_err[60] = 0
> >> rx_crc_errors_phy[61] = 0 rx_in_range_len_errors_phy[62] = 0
> >> rx_symbol_err_phy[63] = 0 tx_errors_phy[64] = 0 rx_out_of_buffer[65]
> >> = 0 tx_packets_phy[66] = 31986429 rx_packets_phy[67] = 36243270
> >> tx_bytes_phy[68] = 47979639204 rx_bytes_phy[69] = 54364900704
> >>
> >>
> >> Thanks,
> >> Tom
> >
> > [1] https://community.mellanox.com/docs/DOC-2532
> > [2]
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo
> c
> >
> .dpdk.org%2Fguides%2Fnics%2Fmlx5.html&amp;data=02%7C01%7Cshahafs
> %40mel
> >
> lanox.com%7Cfb4a024df18c43fa3ede08d5f08d662c%7Ca652971c7d2e4d9ba6
> a4d14
> >
> 9256f461b%7C0%7C0%7C636679412485381834&amp;sdata=AqmeV36SzgCaN
> azE8PMna
> > sdycGqkW7w98v9WeNij9bw%3D&amp;reserved=0

      reply	other threads:[~2018-07-24  7:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 15:41 tom.barbette
2018-07-22  5:14 ` Shahaf Shuler
2018-07-23 11:14   ` tom.barbette
2018-07-24  7:33     ` Shahaf Shuler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB7PR05MB44262FDC181E55B6B7B6984AC3550@DB7PR05MB4426.eurprd05.prod.outlook.com \
    --to=shahafs@mellanox.com \
    --cc=erezf@mellanox.com \
    --cc=katsikas@kth.se \
    --cc=tom.barbette@uliege.be \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).