AW: [mlx5] Loss of packet pacing precision under high Tx loads

DPDK usage discussions
 help / color / mirror / Atom feed

From: "Engelhardt, Maximilian" <maximilian.engelhardt@iis.fraunhofer.de>
To: Slava Ovsiienko <viacheslavo@nvidia.com>,
	"users@dpdk.org" <users@dpdk.org>
Cc: Maayan Kashani <mkashani@nvidia.com>,
	Carsten Andrich <carsten.andrich@tu-ilmenau.de>
Subject: AW: [mlx5] Loss of packet pacing precision under high Tx loads
Date: Tue, 16 Jan 2024 13:56:43 +0000	[thread overview]
Message-ID: <9811b4c617da4e95a78f0e34431fe770@iis.fraunhofer.de> (raw)
In-Reply-To: <IA1PR12MB8078A0BC6AEE161ABD747B44DF6D2@IA1PR12MB8078.namprd12.prod.outlook.com>

Hi Slava,

I'm using an 100Gbit link and want to transfer 10GByte (80 Gbit) per second. I did test it with different numbers of queues (1,2,4,8) without any change to the result. In our application, the other end (FPGA) does not support L2 flow control.

As you assume, the problem does not seem to be in the actual NIC timestamping as is guessed first, but in the interaction of host and NIC: I have inserted another thread in my application that does nothing but repeatedly call rte_delay_us_block(1) and measures the elapsed time. This shows the same effect: from time to time all threads are blocked simultaneously for over 100µs, whether they are interacting with the NIC or not.

I seem to have the same problem as described here: https://www.mail-archive.com/users@dpdk.org/msg07437.html

Investigating further, I discovered strange behavior: In my main application (not the MWE posted here), the problem also occurs when receiving the data when the packet load changes (start and end of the data stream). Normally, the received data is copied into a large buffer - if I comment out this memcpy, i.e. *reduce* the workload, these stalls occur *more* often. It also seems to depend on the software environment: On Debian stalls are less frequent than when using NIXOS(same hardware and the same isolation features).

How can I enable "DMA to LLC"? If I see correctly, "Direct Cache Access" is an Inter-exclusive feature not available on the AMD EPYC CPUSs we are using.

I would be grateful for any advice on how I could solve the problem.

Thank you and best regards,
Max

>-----Ursprüngliche Nachricht-----
>Von: Slava Ovsiienko <viacheslavo@nvidia.com>
>Gesendet: Sonntag, 14. Januar 2024 12:09
>An: users@dpdk.org
>Cc: Engelhardt, Maximilian <maximilian.engelhardt@iis.fraunhofer.de>;
>Maayan Kashani <mkashani@nvidia.com>
>Betreff: RE: [mlx5] Loss of packet pacing precision under high Tx loads
>
>Hi, Max
>
>As far as I understand, some packets are delayed.
>What Is the data rate? 10 GigaBytes (not 10 Gbits) ?
>What is the connection rate? 100 Gbps?
>It is not trivial to satisfy correct packet delivery for highload (> 50% of line rate)
>connections, a lot of aspects are involved.
>Sometimes the traffic schedules from neighbor queues are just overlapped.
>
>I have some extra questions:
>How many Tx queues do you use? (8 is optimal, over 32 on CX6 might induce
>the performance penalty).
>Did your traffic contain VLAN headers ?
>Did you disable L2 flow control ?
>High wander value rather indicates we have an issue with overloaded PCIe
>bus/host memory.
>Did you enable the option on the host "DMA to LLC (last layer cache)" ?
>
>With best regards,
>Slava
>
>>
>>
>>From: Engelhardt, Maximilian
><mailto:maximilian.engelhardt@iis.fraunhofer.de>
>>Sent: Wednesday, 8 November 2023 17:41
>>To: mailto:users@dpdk.org
>>Cc: Andrich, Carsten <mailto:carsten.andrich@iis.fraunhofer.de>
>>Subject: [mlx5] Loss of packet pacing precision under high Tx loads
>>
>>Hi
>>I am currently working on a system in which a high-rate data stream is to be
>transmitted to an FPGA. As this only has small buffers available, I am using the
>packet pacing function of the NIC Mellanox ConnectX-6 MCX623106AN to send
>the packets at uniform intervals. This works if I only transfer 5 GB/s per second,
>but >as soon as I step up to 10 GB/s, after a few seconds errors begin to occur:
>The tx_pp_wander value increases significantly (>80000ns) and there are large
>gaps in the packet stream (>100µs, the affected packets are not lost, but arrive
>later).
>>To demonstrate this, I connected my host to another computer with the same
>type of NIC via a DAC cable, enabling Rx hardware timestamping on the second
>device and observing the timing difference between adjacent packets. The
>code for this minimum working example is attached to this message. It
>includes an >assertion to ensure that every packet is enqueued well before its
>Tx time comes, so software timing should not influence the issue.
>>I tested different packet pacing granularity settings (tx_pp) in the range of
>500ns-4µs, which did not change the outcome. Also, enabling Tx timestamping
>only for every 16th packet did not have the desired effect. Distributing the
>workload over multiple threads and Tx queues also has no effect. The NIC is
>connected via >PCIe 4.0x16 and has firmware version 22.38.1002, DPDK
>version 22.11.3-2.
>>To be able to use packet pacing, the configuration
>REAL_TIME_CLOCK_ENABLE=1 must be set for this NIC. Is it possible that the
>large gaps are caused by the NIC and host clock synchronizing mechanism not
>working correctly under the high packet load? In my specific application I do
>not need a real-time NIC clock - the >synchronization between the devices is
>done via feedback from the FPGA. Is there any way to eliminate these jumps in
>the NIC clock?
>>Thank you and best regards
>>Max

next prev parent reply	other threads:[~2024-01-16 13:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 15:41 Engelhardt, Maximilian
     [not found] ` <IA1PR12MB7544EEFCA046A44D8D9EBF9EB2642@IA1PR12MB7544.namprd12.prod.outlook.com>
2024-01-14 11:08   ` Slava Ovsiienko
2024-01-16 13:56     ` Engelhardt, Maximilian [this message]
2024-01-17 13:43       ` Slava Ovsiienko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9811b4c617da4e95a78f0e34431fe770@iis.fraunhofer.de \
    --to=maximilian.engelhardt@iis.fraunhofer.de \
    --cc=carsten.andrich@tu-ilmenau.de \
    --cc=mkashani@nvidia.com \
    --cc=users@dpdk.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).