Hi
I am currently working on a system in which a high-rate data stream is to be transmitted to an FPGA. As this only has small buffers available, I am using the packet pacing function of the NIC Mellanox ConnectX-6 MCX623106AN to send the packets at uniform intervals. This works if I only transfer 5 GB/s per second, but as soon as I step up to 10 GB/s, after a few seconds errors begin to occur: The tx_pp_wander value increases significantly (>80000ns) and there are large gaps in the packet stream (>100µs, the affected packets are not lost, but arrive later).

To demonstrate this, I connected my host to another computer with the same type of NIC via a DAC cable, enabling Rx hardware timestamping on the second device and observing the timing difference between adjacent packets. The code for this minimum working example is attached to this message. It includes an assertion to ensure that every packet is enqueued well before its Tx time comes, so software timing should not influence the issue.

I tested different packet pacing granularity settings (tx_pp) in the range of 500ns-4µs, which did not change the outcome. Also, enabling Tx timestamping only for every 16th packet did not have the desired effect. Distributing the workload over multiple threads and Tx queues also has no effect. The NIC is connected via PCIe 4.0x16 and has firmware version 22.38.1002, DPDK version 22.11.3-2.

To be able to use packet pacing, the configuration REAL_TIME_CLOCK_ENABLE=1 must be set for this NIC. Is it possible that the large gaps are caused by the NIC and host clock synchronizing mechanism not working correctly under the high packet load? In my specific application I do not need a real-time NIC clock - the synchronization between the devices is done via feedback from the FPGA. Is there any way to eliminate these jumps in the NIC clock?

Thank you and best regards
Max