The mlx5 provides the scheduling send on time capability. To check the operating status of this feature the extended statistics counters are provided. This patch adds the counter descriptions and provides some meaningful information how to interpret the counter values in runtime. Signed-off-by: Viacheslav Ovsiienko --- doc/guides/nics/mlx5.rst | 59 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index b5522d50c5..5db4aeda1b 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -2662,3 +2662,62 @@ Destroy GENEVE TLV parser for specific port:: This command doesn't destroy the global list, For releasing options, ``flush`` command should be used. + + +Extended Statistics Counters +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Send Scheduling Extended Statistics Counters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The mlx5 PMD provides a comprehensive set of counters designed for debugging +and diagnostics related to packet scheduling during transmission. These counters +are applicable only if the port was configured with the ``tx_pp`` devarg and +reflect the status of the PMD scheduling infrastructure based on Clock and +Rearm Queues, used as a workaround on ConnectX-6DX NICs. + +- ``tx_pp_missed_interrupt_errors`` - indicates that the Rearm Queue interrupt + was not serviced on time. The EAL manages interrupts in a dedicated thread, + and it is possible that other time-consuming actions were being processed + concurrently. + +- ``tx_pp_rearm_queue_errors`` - signifies hardware errors that occurred + on the Rearm Queue, typically caused by delays in servicing interrupts. + +- ``tx_pp_clock_queue_errors`` - reflects hardware errors on the Clock Queue, + which usually indicate configuration issues or problems with the internal NIC + hardware or firmware. + +- ``tx_pp_timestamp_past_errors`` - tracks the application attempted to send + packets with timestamps set in the past. It is useful for debugging application + code and does not indicate a malfunction of the PMD. + +- ``tx_pp_timestamp_future_errors`` - records attempts by the application to send + packets with timestamps set too far into the future, exceeding the hardware’s + scheduling capabilities. Like the previous counter, it aids in application + debugging without suggesting a PMD malfunction. + +- ``tx_pp_jitter`` - measures the internal NIC real-time clock jitter estimation + between two consecutive Clock Queue completions, expressed in nanoseconds. + Significant jitter may signal potential clock synchronization issues, + possibly due to inappropriate adjustments made by a system PTP + (Precision Time Protocol) agent. + +- ``tx_pp_wander`` - indicates the long-term stability of the internal NIC + real-time clock over 2^24 completions, measured in nanoseconds. Significant + wander may also suggest clock synchronization problems. + +- ``tx_pp_sync_lost`` - a general operational indicator; a non-zero value + indicates that the driver has lost synchronization with the Clock Queue, + resulting in improper scheduling operations. To restore correct scheduling + functionality, it is necessary to restart the port. + +The following counters are particularly valuable for verifying and debugging +application code. They do not indicate driver or hardware malfunctions and +are applicable to newer hardware with direct on-time scheduling capabilities +(such as ConnectX-7 and above): + +- ``tx_pp_timestamp_order_errors`` - indicates attempts by the application + to send packets with timestamps that are not in strictly ascending order. + Since the PMD does not reorder packets within hardware queues, violations + of timestamp order can lead to packets being sent at incorrect times. -- 2.34.1