DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 1/1] doc: add mlx5 xstats send scheduling counters description
@ 2024-10-28 14:27 Viacheslav Ovsiienko
  2024-10-28 15:57 ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: Viacheslav Ovsiienko @ 2024-10-28 14:27 UTC (permalink / raw)
  To: dev; +Cc: rasland, matan, suanmingm

The mlx5 provides the scheduling send on time capability.
The check the operating status of this feature the xstats
counters are provided. This patch adds the counter descriptions
and provides some meaningful information how to interpret
the counter values in runtime.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 48 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f82e2d75de..8d1a1311d4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2655,3 +2655,51 @@ Destroy GENEVE TLV parser for specific port::
 
 This command doesn't destroy the global list,
 For releasing options, ``flush`` command should be used.
+
+
+Extended statistics counters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Send scheduling related xstats counters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The mlx5 PMD provides the set of tx_pp feature related counters to provide debug and diagnostics
+on send packet scheduling. These counters are applicable only if port was probed with ``tx_pp``
+devarg and reflect the status of PMD scheduling infrastructure based on Clock and Rearm Queues.
+This infrastructure provedies the Send Scheduling capability on CX6DX NICs as temporary workaround
+and should not be engaged on the newer hardware.
+
+- ``tx_pp_missed_interrupt_errors`` - the Rearm Queue interrupt was not serviced in time. EAL handles
+  interrupts in dedicated thread and, possible, there were another time-consuming actions were taken.
+
+- ``tx_pp_rearm_queue_errors`` - hardware errors occurred on Rearm Queue, usually it is caused by not
+  servicing interrupts in time
+
+- ``tx_pp_clock_queue_errors`` - hardware errors occurred on Clock Queue, usually it indicates some
+  configuration or internal NIC hardware or firmware issues
+
+- ``tx_pp_timestamp_past_errors`` - application tried to send packet(s) with specifying timestamp in the past.
+  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
+
+- ``tx_pp_timestamp_future_errors`` - application tried to send packet(s) with specifying timestamp
+  in the too distant future, beyond the hardware capabilities to schedule the sending
+  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
+
+- ``tx_pp_jitter`` - this counter exposes the internal NIC realtime clock jitter estimation between two
+  neighbour Clock Queue completions in nanoseconds. Significant jitter might alert about clock
+  synchronization issues (say, some system PTP agent might adjust NIC clock in inappropriate way)
+
+- ``tx_pp_wander`` - the counter exposes the longterm internal NUC realtime clock stability - tx_pp_wander
+  for 2^24 completions, in nanoseconds. Significant wander might indicate clock synchronization issues.
+
+- ``tx_pp_sync_lost`` - the general operating indicator, the non-zero value says the driver lost
+  the Clock Queue synchronization and scheduling does not operate correctly. The port must be restarted
+  to restore the correct scheduling functioning.
+
+The following counters are extremely useful for application code check and debug, these ones do not
+indicate driver or hardware mulfunctions, and are also applicable for the newer hardware (with direct
+on time scheduling capabilities - ConnectX-7 and above):
+
+- ``tx_pp_timestamp_order_errors`` - application tried to send packet(s) with timestamps in not
+  strictly ascending order. Because of PMD does not reorder packets in the hardware queues, scheduling
+  timestamps order violation causes sending packets in wrong moments of time.
-- 
2.34.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] doc: add mlx5 xstats send scheduling counters description
  2024-10-28 14:27 [PATCH 1/1] doc: add mlx5 xstats send scheduling counters description Viacheslav Ovsiienko
@ 2024-10-28 15:57 ` Stephen Hemminger
  2024-10-31  8:04   ` [PATCH v2] " Viacheslav Ovsiienko
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2024-10-28 15:57 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, rasland, matan, suanmingm

On Mon, 28 Oct 2024 16:27:41 +0200
Viacheslav Ovsiienko <viacheslavo@nvidia.com> wrote:

> The mlx5 provides the scheduling send on time capability.
> The check the operating status of this feature the xstats
> counters are provided. This patch adds the counter descriptions
> and provides some meaningful information how to interpret
> the counter values in runtime.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/nics/mlx5.rst | 48 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> index f82e2d75de..8d1a1311d4 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -2655,3 +2655,51 @@ Destroy GENEVE TLV parser for specific port::
>  
>  This command doesn't destroy the global list,
>  For releasing options, ``flush`` command should be used.
> +
> +
> +Extended statistics counters
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Send scheduling related xstats counters
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The mlx5 PMD provides the set of tx_pp feature related counters to provide debug and diagnostics
> +on send packet scheduling. These counters are applicable only if port was probed with ``tx_pp``
> +devarg and reflect the status of PMD scheduling infrastructure based on Clock and Rearm Queues.
> +This infrastructure provedies the Send Scheduling capability on CX6DX NICs as temporary workaround
> +and should not be engaged on the newer hardware.
> +
> +- ``tx_pp_missed_interrupt_errors`` - the Rearm Queue interrupt was not serviced in time. EAL handles
> +  interrupts in dedicated thread and, possible, there were another time-consuming actions were taken.
> +
> +- ``tx_pp_rearm_queue_errors`` - hardware errors occurred on Rearm Queue, usually it is caused by not
> +  servicing interrupts in time
> +
> +- ``tx_pp_clock_queue_errors`` - hardware errors occurred on Clock Queue, usually it indicates some
> +  configuration or internal NIC hardware or firmware issues
> +
> +- ``tx_pp_timestamp_past_errors`` - application tried to send packet(s) with specifying timestamp in the past.
> +  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
> +
> +- ``tx_pp_timestamp_future_errors`` - application tried to send packet(s) with specifying timestamp
> +  in the too distant future, beyond the hardware capabilities to schedule the sending
> +  This counter is useful to check and debug the application code, it does not indicate PMD malfunction.
> +
> +- ``tx_pp_jitter`` - this counter exposes the internal NIC realtime clock jitter estimation between two
> +  neighbour Clock Queue completions in nanoseconds. Significant jitter might alert about clock
> +  synchronization issues (say, some system PTP agent might adjust NIC clock in inappropriate way)
> +
> +- ``tx_pp_wander`` - the counter exposes the longterm internal NUC realtime clock stability - tx_pp_wander
> +  for 2^24 completions, in nanoseconds. Significant wander might indicate clock synchronization issues.
> +
> +- ``tx_pp_sync_lost`` - the general operating indicator, the non-zero value says the driver lost
> +  the Clock Queue synchronization and scheduling does not operate correctly. The port must be restarted
> +  to restore the correct scheduling functioning.
> +
> +The following counters are extremely useful for application code check and debug, these ones do not
> +indicate driver or hardware mulfunctions, and are also applicable for the newer hardware (with direct
> +on time scheduling capabilities - ConnectX-7 and above):
> +
> +- ``tx_pp_timestamp_order_errors`` - application tried to send packet(s) with timestamps in not
> +  strictly ascending order. Because of PMD does not reorder packets in the hardware queues, scheduling
> +  timestamps order violation causes sending packets in wrong moments of time.

Lots of grammar and spelling errors and overly wordy.
Please spend some time cleaning up the wording, find a writer or AI tool to help.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] doc: add mlx5 xstats send scheduling counters description
  2024-10-28 15:57 ` Stephen Hemminger
@ 2024-10-31  8:04   ` Viacheslav Ovsiienko
  0 siblings, 0 replies; 3+ messages in thread
From: Viacheslav Ovsiienko @ 2024-10-31  8:04 UTC (permalink / raw)
  To: dev; +Cc: rasland, matan, suanmingm, stephen

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="yes", Size: 3857 bytes --]

The mlx5 provides the scheduling send on time capability.
To check the operating status of this feature the extended statistics
counters are provided. This patch adds the counter descriptions
and provides some meaningful information how to interpret
the counter values in runtime.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 59 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b5522d50c5..5db4aeda1b 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2662,3 +2662,62 @@ Destroy GENEVE TLV parser for specific port::
 
 This command doesn't destroy the global list,
 For releasing options, ``flush`` command should be used.
+
+
+Extended Statistics Counters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Send Scheduling Extended Statistics Counters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The mlx5 PMD provides a comprehensive set of counters designed for debugging
+and diagnostics related to packet scheduling during transmission. These counters
+are applicable only if the port was configured with the ``tx_pp`` devarg and
+reflect the status of the PMD scheduling infrastructure based on Clock and
+Rearm Queues, used as a workaround on ConnectX-6DX NICs.
+
+- ``tx_pp_missed_interrupt_errors`` - indicates that the Rearm Queue interrupt
+  was not serviced on time. The EAL manages interrupts in a dedicated thread,
+  and it is possible that other time-consuming actions were being processed
+  concurrently.
+
+- ``tx_pp_rearm_queue_errors`` - signifies hardware errors that occurred
+  on the Rearm Queue, typically caused by delays in servicing interrupts.
+
+- ``tx_pp_clock_queue_errors`` - reflects hardware errors on the Clock Queue,
+  which usually indicate configuration issues or problems with the internal NIC
+  hardware or firmware.
+
+- ``tx_pp_timestamp_past_errors`` - tracks the application attempted to send
+  packets with timestamps set in the past. It is useful for debugging application
+  code and does not indicate a malfunction of the PMD.
+
+- ``tx_pp_timestamp_future_errors`` - records attempts by the application to send
+  packets with timestamps set too far into the future, exceeding the hardware’s
+  scheduling capabilities. Like the previous counter, it aids in application
+  debugging without suggesting a PMD malfunction.
+
+- ``tx_pp_jitter`` - measures the internal NIC real-time clock jitter estimation
+  between two consecutive Clock Queue completions, expressed in nanoseconds.
+  Significant jitter may signal potential clock synchronization issues,
+  possibly due to inappropriate adjustments made by a system PTP
+  (Precision Time Protocol) agent.
+
+- ``tx_pp_wander`` - indicates the long-term stability of the internal NIC
+  real-time clock over 2^24 completions, measured in nanoseconds. Significant
+  wander may also suggest clock synchronization problems.
+
+- ``tx_pp_sync_lost`` - a general operational indicator; a non-zero value
+  indicates that the driver has lost synchronization with the Clock Queue,
+  resulting in improper scheduling operations. To restore correct scheduling
+  functionality, it is necessary to restart the port.
+
+The following counters are particularly valuable for verifying and debugging
+application code. They do not indicate driver or hardware malfunctions and
+are applicable to newer hardware with direct on-time scheduling capabilities
+(such as ConnectX-7 and above):
+
+- ``tx_pp_timestamp_order_errors`` - indicates attempts by the application
+  to send packets with timestamps that are not in strictly ascending order.
+  Since the PMD does not reorder packets within hardware queues, violations
+  of timestamp order can lead to packets being sent at incorrect times.
-- 
2.34.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-10-31  8:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-28 14:27 [PATCH 1/1] doc: add mlx5 xstats send scheduling counters description Viacheslav Ovsiienko
2024-10-28 15:57 ` Stephen Hemminger
2024-10-31  8:04   ` [PATCH v2] " Viacheslav Ovsiienko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).