From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: "Engelhardt,
Maximilian" <maximilian.engelhardt@iis.fraunhofer.de>,
"users@dpdk.org" <users@dpdk.org>
Cc: Maayan Kashani <mkashani@nvidia.com>,
Carsten Andrich <carsten.andrich@tu-ilmenau.de>,
Asaf Penso <asafp@nvidia.com>
Subject: RE: [mlx5] Loss of packet pacing precision under high Tx loads
Date: Wed, 17 Jan 2024 13:43:16 +0000 [thread overview]
Message-ID: <IA1PR12MB8078C59A762D28AA07722DDDDF722@IA1PR12MB8078.namprd12.prod.outlook.com> (raw)
In-Reply-To: <9811b4c617da4e95a78f0e34431fe770@iis.fraunhofer.de>
Hi, Max
> effect: from time to time all threads are blocked simultaneously for over 100µs,
> whether they are interacting with the NIC or not.
From our experience I would recommend to check:
- some process with higher prio preempts your application
- NUMA balancer is disabled. This special kernel feature periodically unmaps
the whole process memory and checks in exceptions the memory belong to the correct NUMA node
This might cause hiccup
- SMI - System Management Interrupt, all CPU caches flushed, all cores stalled and CPU goes
to special mode to handle HW events. SMI statistics can be checked with turbostat utility
> How can I enable "DMA to LLC"? If I see correctly, "Direct Cache Access" is an
> Inter-exclusive feature not available on the AMD EPYC CPUSs we are using.
Does your EPYC have no DDIO or something similar ? ☹
With best regards,
Slava
>
> -----Original Message-----
> From: Engelhardt, Maximilian <maximilian.engelhardt@iis.fraunhofer.de>
> Sent: Tuesday, January 16, 2024 3:57 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; users@dpdk.org
> Cc: Maayan Kashani <mkashani@nvidia.com>; Carsten Andrich
> <carsten.andrich@tu-ilmenau.de>
> Subject: AW: [mlx5] Loss of packet pacing precision under high Tx loads
>
> Hi Slava,
>
> I'm using an 100Gbit link and want to transfer 10GByte (80 Gbit) per second. I
> did test it with different numbers of queues (1,2,4,8) without any change to the
> result. In our application, the other end (FPGA) does not support L2 flow
> control.
>
> As you assume, the problem does not seem to be in the actual NIC timestamping
> as is guessed first, but in the interaction of host and NIC: I have inserted another
> thread in my application that does nothing but repeatedly call
> rte_delay_us_block(1) and measures the elapsed time. This shows the same
> effect: from time to time all threads are blocked simultaneously for over 100µs,
> whether they are interacting with the NIC or not.
>
> I seem to have the same problem as described here: https://www.mail-
> archive.com/users@dpdk.org/msg07437.html
>
> Investigating further, I discovered strange behavior: In my main application (not
> the MWE posted here), the problem also occurs when receiving the data when
> the packet load changes (start and end of the data stream). Normally, the
> received data is copied into a large buffer - if I comment out this memcpy, i.e.
> *reduce* the workload, these stalls occur *more* often. It also seems to
> depend on the software environment: On Debian stalls are less frequent than
> when using NIXOS(same hardware and the same isolation features).
>
> How can I enable "DMA to LLC"? If I see correctly, "Direct Cache Access" is an
> Inter-exclusive feature not available on the AMD EPYC CPUSs we are using.
>
> I would be grateful for any advice on how I could solve the problem.
>
> Thank you and best regards,
> Max
>
> >-----Ursprüngliche Nachricht-----
> >Von: Slava Ovsiienko <viacheslavo@nvidia.com>
> >Gesendet: Sonntag, 14. Januar 2024 12:09
> >An: users@dpdk.org
> >Cc: Engelhardt, Maximilian <maximilian.engelhardt@iis.fraunhofer.de>;
> >Maayan Kashani <mkashani@nvidia.com>
> >Betreff: RE: [mlx5] Loss of packet pacing precision under high Tx loads
> >
> >Hi, Max
> >
> >As far as I understand, some packets are delayed.
> >What Is the data rate? 10 GigaBytes (not 10 Gbits) ?
> >What is the connection rate? 100 Gbps?
> >It is not trivial to satisfy correct packet delivery for highload (>
> >50% of line rate) connections, a lot of aspects are involved.
> >Sometimes the traffic schedules from neighbor queues are just overlapped.
> >
> >I have some extra questions:
> >How many Tx queues do you use? (8 is optimal, over 32 on CX6 might
> >induce the performance penalty).
> >Did your traffic contain VLAN headers ?
> >Did you disable L2 flow control ?
> >High wander value rather indicates we have an issue with overloaded
> >PCIe bus/host memory.
> >Did you enable the option on the host "DMA to LLC (last layer cache)" ?
> >
> >With best regards,
> >Slava
> >
> >>
> >>
> >>From: Engelhardt, Maximilian
> ><mailto:maximilian.engelhardt@iis.fraunhofer.de>
> >>Sent: Wednesday, 8 November 2023 17:41
> >>To: mailto:users@dpdk.org
> >>Cc: Andrich, Carsten <mailto:carsten.andrich@iis.fraunhofer.de>
> >>Subject: [mlx5] Loss of packet pacing precision under high Tx loads
> >>
> >>Hi
> >>I am currently working on a system in which a high-rate data stream is
> >>to be
> >transmitted to an FPGA. As this only has small buffers available, I am
> >using the packet pacing function of the NIC Mellanox ConnectX-6
> >MCX623106AN to send the packets at uniform intervals. This works if I
> >only transfer 5 GB/s per second, but >as soon as I step up to 10 GB/s, after a
> few seconds errors begin to occur:
> >The tx_pp_wander value increases significantly (>80000ns) and there are
> >large gaps in the packet stream (>100µs, the affected packets are not
> >lost, but arrive later).
> >>To demonstrate this, I connected my host to another computer with the
> >>same
> >type of NIC via a DAC cable, enabling Rx hardware timestamping on the
> >second device and observing the timing difference between adjacent
> >packets. The code for this minimum working example is attached to this
> >message. It includes an >assertion to ensure that every packet is
> >enqueued well before its Tx time comes, so software timing should not
> influence the issue.
> >>I tested different packet pacing granularity settings (tx_pp) in the
> >>range of
> >500ns-4µs, which did not change the outcome. Also, enabling Tx
> >timestamping only for every 16th packet did not have the desired
> >effect. Distributing the workload over multiple threads and Tx queues
> >also has no effect. The NIC is connected via >PCIe 4.0x16 and has
> >firmware version 22.38.1002, DPDK version 22.11.3-2.
> >>To be able to use packet pacing, the configuration
> >REAL_TIME_CLOCK_ENABLE=1 must be set for this NIC. Is it possible that
> >the large gaps are caused by the NIC and host clock synchronizing
> >mechanism not working correctly under the high packet load? In my
> >specific application I do not need a real-time NIC clock - the
> >>synchronization between the devices is done via feedback from the
> >FPGA. Is there any way to eliminate these jumps in the NIC clock?
> >>Thank you and best regards
> >>Max
next prev parent reply other threads:[~2024-01-17 13:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-08 15:41 Engelhardt, Maximilian
[not found] ` <IA1PR12MB7544EEFCA046A44D8D9EBF9EB2642@IA1PR12MB7544.namprd12.prod.outlook.com>
2024-01-14 11:08 ` Slava Ovsiienko
2024-01-16 13:56 ` AW: " Engelhardt, Maximilian
2024-01-17 13:43 ` Slava Ovsiienko [this message]
2023-11-13 8:27 Engelhardt, Maximilian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=IA1PR12MB8078C59A762D28AA07722DDDDF722@IA1PR12MB8078.namprd12.prod.outlook.com \
--to=viacheslavo@nvidia.com \
--cc=asafp@nvidia.com \
--cc=carsten.andrich@tu-ilmenau.de \
--cc=maximilian.engelhardt@iis.fraunhofer.de \
--cc=mkashani@nvidia.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).