RE: [mlx5] Loss of packet pacing precision under high Tx loads

DPDK usage discussions
 help / color / mirror / Atom feed

From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: "Engelhardt,
	Maximilian" <maximilian.engelhardt@iis.fraunhofer.de>,
	"users@dpdk.org" <users@dpdk.org>
Cc: Maayan Kashani <mkashani@nvidia.com>,
	Carsten Andrich <carsten.andrich@tu-ilmenau.de>,
	Asaf Penso <asafp@nvidia.com>
Subject: RE: [mlx5] Loss of packet pacing precision under high Tx loads
Date: Wed, 17 Jan 2024 13:43:16 +0000	[thread overview]
Message-ID: <IA1PR12MB8078C59A762D28AA07722DDDDF722@IA1PR12MB8078.namprd12.prod.outlook.com> (raw)
In-Reply-To: <9811b4c617da4e95a78f0e34431fe770@iis.fraunhofer.de>

Hi, Max

> effect: from time to time all threads are blocked simultaneously for over 100µs,
> whether they are interacting with the NIC or not.

From our experience I would recommend to check:

- some process with higher prio preempts your application

- NUMA balancer is disabled. This special kernel feature periodically unmaps
  the whole process memory and checks in exceptions the memory belong to the correct NUMA node
  This might cause hiccup

- SMI - System Management Interrupt, all CPU caches flushed, all cores stalled and CPU goes
 to special mode to handle HW events. SMI statistics can be checked with turbostat utility

> How can I enable "DMA to LLC"? If I see correctly, "Direct Cache Access" is an
> Inter-exclusive feature not available on the AMD EPYC CPUSs we are using.
Does your EPYC have no DDIO or something similar ? ☹

With best regards,
Slava

>
> -----Original Message-----
> From: Engelhardt, Maximilian <maximilian.engelhardt@iis.fraunhofer.de>
> Sent: Tuesday, January 16, 2024 3:57 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; users@dpdk.org
> Cc: Maayan Kashani <mkashani@nvidia.com>; Carsten Andrich
> <carsten.andrich@tu-ilmenau.de>
> Subject: AW: [mlx5] Loss of packet pacing precision under high Tx loads
> 
> Hi Slava,
> 
> I'm using an 100Gbit link and want to transfer 10GByte (80 Gbit) per second. I
> did test it with different numbers of queues (1,2,4,8) without any change to the
> result. In our application, the other end (FPGA) does not support L2 flow
> control.
> 
> As you assume, the problem does not seem to be in the actual NIC timestamping
> as is guessed first, but in the interaction of host and NIC: I have inserted another
> thread in my application that does nothing but repeatedly call
> rte_delay_us_block(1) and measures the elapsed time. This shows the same
> effect: from time to time all threads are blocked simultaneously for over 100µs,
> whether they are interacting with the NIC or not.
> 
> I seem to have the same problem as described here: https://www.mail-
> archive.com/users@dpdk.org/msg07437.html
> 
> Investigating further, I discovered strange behavior: In my main application (not
> the MWE posted here), the problem also occurs when receiving the data when
> the packet load changes (start and end of the data stream). Normally, the
> received data is copied into a large buffer - if I comment out this memcpy, i.e.
> *reduce* the workload, these stalls occur *more* often. It also seems to
> depend on the software environment: On Debian stalls are less frequent than
> when using NIXOS(same hardware and the same isolation features).
> 
> How can I enable "DMA to LLC"? If I see correctly, "Direct Cache Access" is an
> Inter-exclusive feature not available on the AMD EPYC CPUSs we are using.
> 
> I would be grateful for any advice on how I could solve the problem.
> 
> Thank you and best regards,
> Max
> 
> >-----Ursprüngliche Nachricht-----
> >Von: Slava Ovsiienko <viacheslavo@nvidia.com>
> >Gesendet: Sonntag, 14. Januar 2024 12:09
> >An: users@dpdk.org
> >Cc: Engelhardt, Maximilian <maximilian.engelhardt@iis.fraunhofer.de>;
> >Maayan Kashani <mkashani@nvidia.com>
> >Betreff: RE: [mlx5] Loss of packet pacing precision under high Tx loads
> >
> >Hi, Max
> >
> >As far as I understand, some packets are delayed.
> >What Is the data rate? 10 GigaBytes (not 10 Gbits) ?
> >What is the connection rate? 100 Gbps?
> >It is not trivial to satisfy correct packet delivery for highload (>
> >50% of line rate) connections, a lot of aspects are involved.
> >Sometimes the traffic schedules from neighbor queues are just overlapped.
> >
> >I have some extra questions:
> >How many Tx queues do you use? (8 is optimal, over 32 on CX6 might
> >induce the performance penalty).
> >Did your traffic contain VLAN headers ?
> >Did you disable L2 flow control ?
> >High wander value rather indicates we have an issue with overloaded
> >PCIe bus/host memory.
> >Did you enable the option on the host "DMA to LLC (last layer cache)" ?
> >
> >With best regards,
> >Slava
> >
> >>
> >>
> >>From: Engelhardt, Maximilian
> ><mailto:maximilian.engelhardt@iis.fraunhofer.de>
> >>Sent: Wednesday, 8 November 2023 17:41
> >>To: mailto:users@dpdk.org
> >>Cc: Andrich, Carsten <mailto:carsten.andrich@iis.fraunhofer.de>
> >>Subject: [mlx5] Loss of packet pacing precision under high Tx loads
> >>
> >>Hi
> >>I am currently working on a system in which a high-rate data stream is
> >>to be
> >transmitted to an FPGA. As this only has small buffers available, I am
> >using the packet pacing function of the NIC Mellanox ConnectX-6
> >MCX623106AN to send the packets at uniform intervals. This works if I
> >only transfer 5 GB/s per second, but >as soon as I step up to 10 GB/s, after a
> few seconds errors begin to occur:
> >The tx_pp_wander value increases significantly (>80000ns) and there are
> >large gaps in the packet stream (>100µs, the affected packets are not
> >lost, but arrive later).
> >>To demonstrate this, I connected my host to another computer with the
> >>same
> >type of NIC via a DAC cable, enabling Rx hardware timestamping on the
> >second device and observing the timing difference between adjacent
> >packets. The code for this minimum working example is attached to this
> >message. It includes an >assertion to ensure that every packet is
> >enqueued well before its Tx time comes, so software timing should not
> influence the issue.
> >>I tested different packet pacing granularity settings (tx_pp) in the
> >>range of
> >500ns-4µs, which did not change the outcome. Also, enabling Tx
> >timestamping only for every 16th packet did not have the desired
> >effect. Distributing the workload over multiple threads and Tx queues
> >also has no effect. The NIC is connected via >PCIe 4.0x16 and has
> >firmware version 22.38.1002, DPDK version 22.11.3-2.
> >>To be able to use packet pacing, the configuration
> >REAL_TIME_CLOCK_ENABLE=1 must be set for this NIC. Is it possible that
> >the large gaps are caused by the NIC and host clock synchronizing
> >mechanism not working correctly under the high packet load? In my
> >specific application I do not need a real-time NIC clock - the
> >>synchronization between the devices is done via feedback from the
> >FPGA. Is there any way to eliminate these jumps in the NIC clock?
> >>Thank you and best regards
> >>Max

next prev parent reply	other threads:[~2024-01-17 13:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 15:41 Engelhardt, Maximilian
     [not found] ` <IA1PR12MB7544EEFCA046A44D8D9EBF9EB2642@IA1PR12MB7544.namprd12.prod.outlook.com>
2024-01-14 11:08   ` Slava Ovsiienko
2024-01-16 13:56     ` AW: " Engelhardt, Maximilian
2024-01-17 13:43       ` Slava Ovsiienko [this message]
2023-11-13  8:27 Engelhardt, Maximilian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA1PR12MB8078C59A762D28AA07722DDDDF722@IA1PR12MB8078.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=asafp@nvidia.com \
    --cc=carsten.andrich@tu-ilmenau.de \
    --cc=maximilian.engelhardt@iis.fraunhofer.de \
    --cc=mkashani@nvidia.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).