DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC 0/5] net/mlx5: introduce Tx datapath tracing
@ 2023-04-20 10:07 Viacheslav Ovsiienko
  2023-04-20 10:07 ` [RFC 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
                   ` (14 more replies)
  0 siblings, 15 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:07 UTC (permalink / raw)
  To: dev

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Viacheslav Ovsiienko (5):
  app/testpmd: add trace dump command
  common/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add Tx datapath tracing
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script

 app/test-pmd/cmdline.c               |   6 +-
 drivers/common/mlx5/meson.build      |   1 +
 drivers/common/mlx5/mlx5_trace.c     |  25 +++
 drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
 drivers/common/mlx5/version.map      |   8 +
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |   9 +
 drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 12 files changed, 504 insertions(+), 30 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 1/5] app/testpmd: add trace dump command
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-04-20 10:07 ` Viacheslav Ovsiienko
  2023-04-20 10:13   ` Jerin Jacob
  2023-04-20 10:08 ` [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:07 UTC (permalink / raw)
  To: dev

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7b20bef4e9..be9e3a9ed6 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -39,6 +39,7 @@
 #include <rte_gro.h>
 #endif
 #include <rte_mbuf_dyn.h>
+#include <rte_trace.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -8367,6 +8368,8 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_lcore_dump(stdout);
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
 }
 
 static cmdline_parse_token_string_t cmd_dump_dump =
@@ -8379,7 +8382,8 @@ static cmdline_parse_token_string_t cmd_dump_dump =
 		"dump_mempool#"
 		"dump_devargs#"
 		"dump_lcores#"
-		"dump_log_types");
+		"dump_log_types#"
+		"dump_trace");
 
 static cmdline_parse_inst_t cmd_dump = {
 	.f = cmd_dump_parsed,  /* function to call */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-04-20 10:07 ` [RFC 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-04-20 10:08 ` Viacheslav Ovsiienko
  2023-04-20 10:11   ` Jerin Jacob
  2023-04-20 10:08 ` [RFC 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:08 UTC (permalink / raw)
  To: dev

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/meson.build  |  1 +
 drivers/common/mlx5/mlx5_trace.c | 25 +++++++++++
 drivers/common/mlx5/mlx5_trace.h | 72 ++++++++++++++++++++++++++++++++
 drivers/common/mlx5/version.map  |  8 ++++
 4 files changed, 106 insertions(+)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h

diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 9dc809f192..e074ffb140 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -19,6 +19,7 @@ sources += files(
         'mlx5_common_mp.c',
         'mlx5_common_mr.c',
         'mlx5_malloc.c',
+        'mlx5_trace.c',
         'mlx5_common_pci.c',
         'mlx5_common_devx.c',
         'mlx5_common_utils.c',
diff --git a/drivers/common/mlx5/mlx5_trace.c b/drivers/common/mlx5/mlx5_trace.c
new file mode 100644
index 0000000000..b9f14413ad
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_trace_point_register.h>
+#include <mlx5_trace.h>
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
+
diff --git a/drivers/common/mlx5/mlx5_trace.h b/drivers/common/mlx5/mlx5_trace.h
new file mode 100644
index 0000000000..57512e654f
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_PMD_MLX5_TRACE_H_
+#define RTE_PMD_MLX5_TRACE_H_
+
+/**
+ * @file
+ *
+ * API for mlx5 PMD trace support
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <mlx5_prm.h>
+#include <rte_mbuf.h>
+#include <rte_trace_point.h>
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_PMD_MLX5_TRACE_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..d0ec8571e6 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -158,5 +158,13 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	__rte_pmd_mlx5_trace_tx_entry;
+	__rte_pmd_mlx5_trace_tx_exit;
+	__rte_pmd_mlx5_trace_tx_wqe;
+	__rte_pmd_mlx5_trace_tx_wait;
+	__rte_pmd_mlx5_trace_tx_push;
+	__rte_pmd_mlx5_trace_tx_complete;
+
 	local: *;
 };
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 3/5] net/mlx5: add Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-04-20 10:07 ` [RFC 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
  2023-04-20 10:08 ` [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-04-20 10:08 ` Viacheslav Ovsiienko
  2023-04-20 10:08 ` [RFC 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:08 UTC (permalink / raw)
  To: dev

The patch adds tracing capability to Tx datapath.
To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.h   | 19 -------------------
 drivers/net/mlx5/mlx5_rxtx.h | 19 +++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c   |  9 +++++++++
 drivers/net/mlx5/mlx5_tx.h   | 25 +++++++++++++++++++++++--
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 8b87adad36..1b5f110ccc 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -376,25 +376,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..1fe9521dfc 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -232,6 +232,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..7f624de58e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -19,6 +19,8 @@
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_trace.h"
+#include "mlx5_rxtx.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +766,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1697,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1712,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1817,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1900,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2124,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2328,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2698,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2937,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2947,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +2993,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3208,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3271,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3313,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3355,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3725,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 4/5] net/mlx5: add comprehensive send completion trace
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (2 preceding siblings ...)
  2023-04-20 10:08 ` [RFC 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
@ 2023-04-20 10:08 ` Viacheslav Ovsiienko
  2023-04-20 10:08 ` [RFC 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:08 UTC (permalink / raw)
  To: dev

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 67a7bec22b..f3f717f17b 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 7f624de58e..9f29df280f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -728,6 +728,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -754,7 +802,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -763,8 +811,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3662,7 +3714,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 5/5] net/mlx5: add Tx datapath trace analyzing script
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (3 preceding siblings ...)
  2023-04-20 10:08 ` [RFC 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-04-20 10:08 ` Viacheslav Ovsiienko
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-04-20 10:08 UTC (permalink / raw)
  To: dev

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..c8fa63a7b9
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+'''
+Analyzing the mlx5 PMD datapath tracings
+'''
+import sys
+import argparse
+import pathlib
+import bt2
+
+PFX_TX     = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+tx_blst = {}                    # current Tx bursts per CPU
+tx_qlst = {}                    # active Tx queues per port/queue
+tx_wlst = {}                    # wait timestamp list per CPU
+
+class mlx5_queue(object):
+    def __init__(self):
+        self.done_burst = []    # completed bursts
+        self.wait_burst = []    # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        for txb in self.done_burst:
+            txb.log()
+
+
+class mlx5_mbuf(object):
+    def __init__(self):
+        self.wqe = 0            # wqe id
+        self.ptr = None         # first packet mbuf pointer
+        self.len = 0            # packet data length
+        self.nseg = 0           # number of segments
+
+    def log(self):
+        out = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out += " (%d segs)" % self.nseg
+        print(out)
+
+
+class mlx5_wqe(object):
+    def __init__(self):
+        self.mbuf = []          # list of mbufs in WQE
+        self.wait_ts = 0        # preceding wait/push timestamp
+        self.comp_ts = 0        # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        id = (self.opcode >> 8) & 0xFFFF
+        op = self.opcode & 0xFF
+        fl = self.opcode >> 24
+        out = "  %04X: " % id
+        if op == 0xF:
+            out += "WAIT"
+        elif op == 0x29:
+            out += "EMPW"
+        elif op == 0xE:
+            out += "TSO "
+        elif op == 0xA:
+            out += "SEND"
+        else:
+            out += "0x%02X" % op
+        if self.comp_ts != 0:
+            out += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out += " (%d)" % self.wait_ts
+        print(out)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    # return 0 if WQE in not completed
+    def comp(self, wqe_id, ts):
+        if self.comp_ts != 0:
+            return 1
+        id = (self.opcode >> 8) & 0xFFFF
+        if id > wqe_id:
+            id -= wqe_id
+            if id <= 0x8000:
+                return 0
+        else:
+            id = wqe_id - id
+            if id >= 0x8000:
+                return 0
+        self.comp_ts = ts
+        return 1
+
+
+class mlx5_burst(object):
+    def __init__(self):
+        self.wqes = []          # issued burst WQEs
+        self.done = 0           # number of sent/recv packets
+        self.req = 0            # requested number of packets
+        self.call_ts = 0        # burst routine invocation
+        self.done_ts = 0        # burst routine done
+        self.queue = None
+
+    def log(self):
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)" %
+                  (self.call_ts, port, queue, self.done, self.req))
+        else:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts in %u" %
+                  (self.call_ts, port, queue, self.done, self.req,
+                   self.done_ts - self.call_ts))
+        for wqe in self.wqes:
+            wqe.log()
+
+    # return 0 if not all of WQEs in burst completed
+    def comp(self, wqe_id, ts):
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, ts) == 0:
+                return 0
+        return 1
+
+
+def do_tx_entry(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = mlx5_burst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = mlx5_queue();
+        queue.pq_id = pq_id
+        tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = mlx5_wqe()
+    wqe.wait_ts = tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = mlx5_mbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg):
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg):
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg)
+    elif name == "exit":
+        do_tx_exit(msg)
+    elif name == "wqe":
+        do_tx_wqe(msg)
+    elif name == "wait":
+        do_tx_wait(msg)
+    elif name == "push":
+        do_tx_push(msg)
+    elif name == "complete":
+        do_tx_complete(msg)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name)
+        sys.exit(1)
+
+
+def do_log(msg_it):
+    for msg in msg_it:
+        if type(msg) is not bt2._EventMessageConst:
+            continue
+        event = msg.event
+        if event.name.startswith(PFX_TX):
+            do_tx(msg)
+        # Handling of other log event cathegories can be added here
+
+
+def do_print():
+    for pq_id in tx_qlst:
+        queue = tx_qlst.get(pq_id)
+        queue.log()
+
+
+def main(args):
+    parser = argparse.ArgumentParser()
+    parser.add_argument("path",
+                        nargs = 1,
+                        type = str,
+                        help = "input trace folder")
+    args = parser.parse_args()
+
+    msg_it = bt2.TraceCollectionMessageIterator(args.path)
+    do_log(msg_it)
+    do_print()
+    exit(0)
+
+if __name__ == "__main__":
+    main(sys.argv)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-04-20 10:08 ` [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-04-20 10:11   ` Jerin Jacob
  2023-06-13 15:50     ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Jerin Jacob @ 2023-04-20 10:11 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev

On Thu, Apr 20, 2023 at 3:38 PM Viacheslav Ovsiienko
<viacheslavo@nvidia.com> wrote:
>
> There is an intention to engage DPDK tracing capabilities
> for mlx5 PMDs monitoring and profiling in various modes.
> The patch introduces tracepoints for the Tx datapath in
> the ethernet device driver.
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  drivers/common/mlx5/meson.build  |  1 +
>  drivers/common/mlx5/mlx5_trace.c | 25 +++++++++++
>  drivers/common/mlx5/mlx5_trace.h | 72 ++++++++++++++++++++++++++++++++
>  drivers/common/mlx5/version.map  |  8 ++++
>  4 files changed, 106 insertions(+)
>  create mode 100644 drivers/common/mlx5/mlx5_trace.c
>  create mode 100644 drivers/common/mlx5/mlx5_trace.h
>
> diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
> index 9dc809f192..e074ffb140 100644
> --- a/drivers/common/mlx5/meson.build
> +++ b/drivers/common/mlx5/meson.build
> @@ -19,6 +19,7 @@ sources += files(
>          'mlx5_common_mp.c',
>          'mlx5_common_mr.c',
>          'mlx5_malloc.c',
> +        'mlx5_trace.c',
>          'mlx5_common_pci.c',
>          'mlx5_common_devx.c',
>          'mlx5_common_utils.c',
> diff --git a/drivers/common/mlx5/mlx5_trace.c b/drivers/common/mlx5/mlx5_trace.c
> new file mode 100644
> index 0000000000..b9f14413ad
> --- /dev/null
> +++ b/drivers/common/mlx5/mlx5_trace.c
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2022 NVIDIA Corporation & Affiliates
> + */
> +
> +#include <rte_trace_point_register.h>
> +#include <mlx5_trace.h>
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
> +       pmd.net.mlx5.tx.entry)
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
> +       pmd.net.mlx5.tx.exit)
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
> +       pmd.net.mlx5.tx.wqe)
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
> +       pmd.net.mlx5.tx.wait)
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
> +       pmd.net.mlx5.tx.push)
> +
> +RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
> +       pmd.net.mlx5.tx.complete)
> +
> diff --git a/drivers/common/mlx5/mlx5_trace.h b/drivers/common/mlx5/mlx5_trace.h
> new file mode 100644
> index 0000000000..57512e654f
> --- /dev/null
> +++ b/drivers/common/mlx5/mlx5_trace.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2022 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef RTE_PMD_MLX5_TRACE_H_
> +#define RTE_PMD_MLX5_TRACE_H_
> +
> +/**
> + * @file
> + *
> + * API for mlx5 PMD trace support
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <mlx5_prm.h>
> +#include <rte_mbuf.h>
> +#include <rte_trace_point.h>
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_entry,
> +       RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
> +       rte_trace_point_emit_u16(port_id);
> +       rte_trace_point_emit_u16(queue_id);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_exit,
> +       RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
> +       rte_trace_point_emit_u16(nb_sent);
> +       rte_trace_point_emit_u16(nb_req);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_wqe,
> +       RTE_TRACE_POINT_ARGS(uint32_t opcode),
> +       rte_trace_point_emit_u32(opcode);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_wait,
> +       RTE_TRACE_POINT_ARGS(uint64_t ts),
> +       rte_trace_point_emit_u64(ts);
> +)
> +
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_push,
> +       RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
> +       rte_trace_point_emit_ptr(mbuf);
> +       rte_trace_point_emit_u32(mbuf->pkt_len);
> +       rte_trace_point_emit_u16(mbuf->nb_segs);
> +       rte_trace_point_emit_u16(wqe_id);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +       rte_pmd_mlx5_trace_tx_complete,
> +       RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
> +                            uint16_t wqe_id, uint64_t ts),
> +       rte_trace_point_emit_u16(port_id);
> +       rte_trace_point_emit_u16(queue_id);
> +       rte_trace_point_emit_u64(ts);
> +       rte_trace_point_emit_u16(wqe_id);
> +)
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_PMD_MLX5_TRACE_H_ */
> diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
> index e05e1aa8c5..d0ec8571e6 100644
> --- a/drivers/common/mlx5/version.map
> +++ b/drivers/common/mlx5/version.map
> @@ -158,5 +158,13 @@ INTERNAL {
>
>         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
>         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> +
> +       __rte_pmd_mlx5_trace_tx_entry;
> +       __rte_pmd_mlx5_trace_tx_exit;
> +       __rte_pmd_mlx5_trace_tx_wqe;
> +       __rte_pmd_mlx5_trace_tx_wait;
> +       __rte_pmd_mlx5_trace_tx_push;
> +       __rte_pmd_mlx5_trace_tx_complete;

No need to expose these symbols. It is getting removed from rest of
DPDK. Application can do rte_trace_lookup() to get this address.


> +
>         local: *;
>  };
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 1/5] app/testpmd: add trace dump command
  2023-04-20 10:07 ` [RFC 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-04-20 10:13   ` Jerin Jacob
  0 siblings, 0 replies; 76+ messages in thread
From: Jerin Jacob @ 2023-04-20 10:13 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev

On Thu, Apr 20, 2023 at 3:39 PM Viacheslav Ovsiienko
<viacheslavo@nvidia.com> wrote:
>
> The "dump_trace" CLI command is added to trigger
> saving the trace dumps to the trace directory.
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  app/test-pmd/cmdline.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 7b20bef4e9..be9e3a9ed6 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -39,6 +39,7 @@
>  #include <rte_gro.h>
>  #endif
>  #include <rte_mbuf_dyn.h>
> +#include <rte_trace.h>
>
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -8367,6 +8368,8 @@ static void cmd_dump_parsed(void *parsed_result,
>                 rte_lcore_dump(stdout);
>         else if (!strcmp(res->dump, "dump_log_types"))
>                 rte_log_dump(stdout);
> +       else if (!strcmp(res->dump, "dump_trace"))
> +               rte_trace_save();

Isn't saving the trace? If so, change the command to save_trace or so.

>  }
>
>  static cmdline_parse_token_string_t cmd_dump_dump =
> @@ -8379,7 +8382,8 @@ static cmdline_parse_token_string_t cmd_dump_dump =
>                 "dump_mempool#"
>                 "dump_devargs#"
>                 "dump_lcores#"
> -               "dump_log_types");
> +               "dump_log_types#"
> +               "dump_trace");
>
>  static cmdline_parse_inst_t cmd_dump = {
>         .f = cmd_dump_parsed,  /* function to call */
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 0/5] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (4 preceding siblings ...)
  2023-04-20 10:08 ` [RFC 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-06-09 15:28 ` Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
                     ` (4 more replies)
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (8 subsequent siblings)
  14 siblings, 5 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Viacheslav Ovsiienko (5):
  app/testpmd: add trace dump command
  common/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add Tx datapath tracing
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script

 app/test-pmd/cmdline.c               |   6 +-
 drivers/common/mlx5/meson.build      |   1 +
 drivers/common/mlx5/mlx5_trace.c     |  25 +++
 drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
 drivers/common/mlx5/version.map      |   8 +
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |   9 +
 drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 12 files changed, 504 insertions(+), 30 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/5] app/testpmd: add trace dump command
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-09 15:28   ` Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7b20bef4e9..be9e3a9ed6 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -39,6 +39,7 @@
 #include <rte_gro.h>
 #endif
 #include <rte_mbuf_dyn.h>
+#include <rte_trace.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -8367,6 +8368,8 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_lcore_dump(stdout);
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
 }
 
 static cmdline_parse_token_string_t cmd_dump_dump =
@@ -8379,7 +8382,8 @@ static cmdline_parse_token_string_t cmd_dump_dump =
 		"dump_mempool#"
 		"dump_devargs#"
 		"dump_lcores#"
-		"dump_log_types");
+		"dump_log_types#"
+		"dump_trace");
 
 static cmdline_parse_inst_t cmd_dump = {
 	.f = cmd_dump_parsed,  /* function to call */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-06-09 15:28   ` Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/meson.build  |  1 +
 drivers/common/mlx5/mlx5_trace.c | 25 +++++++++++
 drivers/common/mlx5/mlx5_trace.h | 72 ++++++++++++++++++++++++++++++++
 drivers/common/mlx5/version.map  |  8 ++++
 4 files changed, 106 insertions(+)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h

diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 1eefc02f06..28bfbfa324 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -19,6 +19,7 @@ sources += files(
         'mlx5_common_mp.c',
         'mlx5_common_mr.c',
         'mlx5_malloc.c',
+        'mlx5_trace.c',
         'mlx5_common_pci.c',
         'mlx5_common_devx.c',
         'mlx5_common_utils.c',
diff --git a/drivers/common/mlx5/mlx5_trace.c b/drivers/common/mlx5/mlx5_trace.c
new file mode 100644
index 0000000000..b9f14413ad
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_trace_point_register.h>
+#include <mlx5_trace.h>
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
+
diff --git a/drivers/common/mlx5/mlx5_trace.h b/drivers/common/mlx5/mlx5_trace.h
new file mode 100644
index 0000000000..57512e654f
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_PMD_MLX5_TRACE_H_
+#define RTE_PMD_MLX5_TRACE_H_
+
+/**
+ * @file
+ *
+ * API for mlx5 PMD trace support
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <mlx5_prm.h>
+#include <rte_mbuf.h>
+#include <rte_trace_point.h>
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_PMD_MLX5_TRACE_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..d0ec8571e6 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -158,5 +158,13 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	__rte_pmd_mlx5_trace_tx_entry;
+	__rte_pmd_mlx5_trace_tx_exit;
+	__rte_pmd_mlx5_trace_tx_wqe;
+	__rte_pmd_mlx5_trace_tx_wait;
+	__rte_pmd_mlx5_trace_tx_push;
+	__rte_pmd_mlx5_trace_tx_complete;
+
 	local: *;
 };
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 3/5] net/mlx5: add Tx datapath tracing
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-06-09 15:28   ` Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

The patch adds tracing capability to Tx datapath.
To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.h   | 19 -------------------
 drivers/net/mlx5/mlx5_rxtx.h | 19 +++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c   |  9 +++++++++
 drivers/net/mlx5/mlx5_tx.h   | 25 +++++++++++++++++++++++--
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 52c35c83f8..ed912ffb99 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -376,25 +376,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..1fe9521dfc 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -232,6 +232,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..7f624de58e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -19,6 +19,8 @@
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_trace.h"
+#include "mlx5_rxtx.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +766,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1697,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1712,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1817,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1900,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2124,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2328,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2698,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2937,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2947,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +2993,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3208,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3271,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3313,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3355,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3725,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 4/5] net/mlx5: add comprehensive send completion trace
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2023-06-09 15:28   ` [PATCH 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-09 15:28   ` Viacheslav Ovsiienko
  2023-06-09 15:28   ` [PATCH 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 7233c2c7fa..b54f3ccd9a 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 7f624de58e..9f29df280f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -728,6 +728,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -754,7 +802,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -763,8 +811,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3662,7 +3714,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 5/5] net/mlx5: add Tx datapath trace analyzing script
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2023-06-09 15:28   ` [PATCH 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-06-09 15:28   ` Viacheslav Ovsiienko
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-09 15:28 UTC (permalink / raw)
  To: dev

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..c8fa63a7b9
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+'''
+Analyzing the mlx5 PMD datapath tracings
+'''
+import sys
+import argparse
+import pathlib
+import bt2
+
+PFX_TX     = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+tx_blst = {}                    # current Tx bursts per CPU
+tx_qlst = {}                    # active Tx queues per port/queue
+tx_wlst = {}                    # wait timestamp list per CPU
+
+class mlx5_queue(object):
+    def __init__(self):
+        self.done_burst = []    # completed bursts
+        self.wait_burst = []    # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        for txb in self.done_burst:
+            txb.log()
+
+
+class mlx5_mbuf(object):
+    def __init__(self):
+        self.wqe = 0            # wqe id
+        self.ptr = None         # first packet mbuf pointer
+        self.len = 0            # packet data length
+        self.nseg = 0           # number of segments
+
+    def log(self):
+        out = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out += " (%d segs)" % self.nseg
+        print(out)
+
+
+class mlx5_wqe(object):
+    def __init__(self):
+        self.mbuf = []          # list of mbufs in WQE
+        self.wait_ts = 0        # preceding wait/push timestamp
+        self.comp_ts = 0        # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        id = (self.opcode >> 8) & 0xFFFF
+        op = self.opcode & 0xFF
+        fl = self.opcode >> 24
+        out = "  %04X: " % id
+        if op == 0xF:
+            out += "WAIT"
+        elif op == 0x29:
+            out += "EMPW"
+        elif op == 0xE:
+            out += "TSO "
+        elif op == 0xA:
+            out += "SEND"
+        else:
+            out += "0x%02X" % op
+        if self.comp_ts != 0:
+            out += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out += " (%d)" % self.wait_ts
+        print(out)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    # return 0 if WQE in not completed
+    def comp(self, wqe_id, ts):
+        if self.comp_ts != 0:
+            return 1
+        id = (self.opcode >> 8) & 0xFFFF
+        if id > wqe_id:
+            id -= wqe_id
+            if id <= 0x8000:
+                return 0
+        else:
+            id = wqe_id - id
+            if id >= 0x8000:
+                return 0
+        self.comp_ts = ts
+        return 1
+
+
+class mlx5_burst(object):
+    def __init__(self):
+        self.wqes = []          # issued burst WQEs
+        self.done = 0           # number of sent/recv packets
+        self.req = 0            # requested number of packets
+        self.call_ts = 0        # burst routine invocation
+        self.done_ts = 0        # burst routine done
+        self.queue = None
+
+    def log(self):
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)" %
+                  (self.call_ts, port, queue, self.done, self.req))
+        else:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts in %u" %
+                  (self.call_ts, port, queue, self.done, self.req,
+                   self.done_ts - self.call_ts))
+        for wqe in self.wqes:
+            wqe.log()
+
+    # return 0 if not all of WQEs in burst completed
+    def comp(self, wqe_id, ts):
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, ts) == 0:
+                return 0
+        return 1
+
+
+def do_tx_entry(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = mlx5_burst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = mlx5_queue();
+        queue.pq_id = pq_id
+        tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = mlx5_wqe()
+    wqe.wait_ts = tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = mlx5_mbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg):
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg):
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg)
+    elif name == "exit":
+        do_tx_exit(msg)
+    elif name == "wqe":
+        do_tx_wqe(msg)
+    elif name == "wait":
+        do_tx_wait(msg)
+    elif name == "push":
+        do_tx_push(msg)
+    elif name == "complete":
+        do_tx_complete(msg)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name)
+        sys.exit(1)
+
+
+def do_log(msg_it):
+    for msg in msg_it:
+        if type(msg) is not bt2._EventMessageConst:
+            continue
+        event = msg.event
+        if event.name.startswith(PFX_TX):
+            do_tx(msg)
+        # Handling of other log event cathegories can be added here
+
+
+def do_print():
+    for pq_id in tx_qlst:
+        queue = tx_qlst.get(pq_id)
+        queue.log()
+
+
+def main(args):
+    parser = argparse.ArgumentParser()
+    parser.add_argument("path",
+                        nargs = 1,
+                        type = str,
+                        help = "input trace folder")
+    args = parser.parse_args()
+
+    msg_it = bt2.TraceCollectionMessageIterator(args.path)
+    do_log(msg_it)
+    do_print()
+    exit(0)
+
+if __name__ == "__main__":
+    main(sys.argv)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-04-20 10:11   ` Jerin Jacob
@ 2023-06-13 15:50     ` Slava Ovsiienko
  2023-06-13 15:53       ` Jerin Jacob
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-13 15:50 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi,

<..snip..>
> >
> >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> >         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> > +
> > +       __rte_pmd_mlx5_trace_tx_entry;
> > +       __rte_pmd_mlx5_trace_tx_exit;
> > +       __rte_pmd_mlx5_trace_tx_wqe;
> > +       __rte_pmd_mlx5_trace_tx_wait;
> > +       __rte_pmd_mlx5_trace_tx_push;
> > +       __rte_pmd_mlx5_trace_tx_complete;
> 
> No need to expose these symbols. It is getting removed from rest of DPDK.
> Application can do rte_trace_lookup() to get this address.
> 
> 
It is not for application, it is for PMD itself, w/o exposing the symbols build failed.

With best regards,
Slava


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-13 15:50     ` Slava Ovsiienko
@ 2023-06-13 15:53       ` Jerin Jacob
  2023-06-13 15:59         ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Jerin Jacob @ 2023-06-13 15:53 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev

On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
>
> Hi,
>
> <..snip..>
> > >
> > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > >         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> > > +
> > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > +       __rte_pmd_mlx5_trace_tx_push;
> > > +       __rte_pmd_mlx5_trace_tx_complete;
> >
> > No need to expose these symbols. It is getting removed from rest of DPDK.
> > Application can do rte_trace_lookup() to get this address.
> >
> >
> It is not for application, it is for PMD itself, w/o exposing the symbols build failed.

PMD is implementing this trace endpoints, not consuming this trace
point. Right? If so, Why to expose these symbols?

>
> With best regards,
> Slava
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-13 15:53       ` Jerin Jacob
@ 2023-06-13 15:59         ` Slava Ovsiienko
  2023-06-13 16:01           ` Jerin Jacob
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-13 15:59 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, June 13, 2023 6:53 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org
> Subject: Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
> 
> On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko <viacheslavo@nvidia.com>
> wrote:
> >
> > Hi,
> >
> > <..snip..>
> > > >
> > > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > > >         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> > > > +
> > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > >
> > > No need to expose these symbols. It is getting removed from rest of DPDK.
> > > Application can do rte_trace_lookup() to get this address.
> > >
> > >
> > It is not for application, it is for PMD itself, w/o exposing the symbols build
> failed.
> 
> PMD is implementing this trace endpoints, not consuming this trace point.
> Right? If so, Why to expose these symbols?

As far as understand:
The tracepoint routines are defined in dedicated common/mlx5_trace.c file.
The tx_burst in mlx5 is implemented as template in header file, and this
template is used in multiple .c files under net/mlx5 filder.
So, common/mlx5 should expose its symbols to net/mlx5 to allow successful
linkage.

With best regards,
Slava

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-13 15:59         ` Slava Ovsiienko
@ 2023-06-13 16:01           ` Jerin Jacob
  2023-06-27  0:39             ` Thomas Monjalon
  0 siblings, 1 reply; 76+ messages in thread
From: Jerin Jacob @ 2023-06-13 16:01 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev

On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
>
> Hi,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, June 13, 2023 6:53 PM
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
> >
> > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko <viacheslavo@nvidia.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > <..snip..>
> > > > >
> > > > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > > > >         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> > > > > +
> > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > >
> > > > No need to expose these symbols. It is getting removed from rest of DPDK.
> > > > Application can do rte_trace_lookup() to get this address.
> > > >
> > > >
> > > It is not for application, it is for PMD itself, w/o exposing the symbols build
> > failed.
> >
> > PMD is implementing this trace endpoints, not consuming this trace point.
> > Right? If so, Why to expose these symbols?
>
> As far as understand:
> The tracepoint routines are defined in dedicated common/mlx5_trace.c file.
> The tx_burst in mlx5 is implemented as template in header file, and this
> template is used in multiple .c files under net/mlx5 filder.
> So, common/mlx5 should expose its symbols to net/mlx5 to allow successful
> linkage.

OK. I missed the fact the these are in common code and net driver is
depened on that.
So changes makes sense.

>
> With best regards,
> Slava

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (5 preceding siblings ...)
  2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-13 16:58 ` Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 1/5] app/testpmd: add trace save command Viacheslav Ovsiienko
                     ` (5 more replies)
  2023-06-26 11:06 ` [PATCH] app/testpmd: add trace dump command Viacheslav Ovsiienko
                   ` (7 subsequent siblings)
  14 siblings, 6 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

Viacheslav Ovsiienko (5):
  app/testpmd: add trace save command
  common/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add Tx datapath tracing
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script

 app/test-pmd/cmdline.c               |  38 ++++
 drivers/common/mlx5/meson.build      |   1 +
 drivers/common/mlx5/mlx5_trace.c     |  25 +++
 drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
 drivers/common/mlx5/version.map      |   8 +
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |   9 +
 drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 12 files changed, 537 insertions(+), 29 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-13 16:58   ` Viacheslav Ovsiienko
  2023-06-21 11:15     ` Ferruh Yigit
  2023-06-13 16:58   ` [PATCH v2 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

The "save_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a15a442a06..db71ce2028 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -39,6 +39,7 @@
 #include <rte_gro.h>
 #endif
 #include <rte_mbuf_dyn.h>
+#include <rte_trace.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t cmd_config_tx_affinity_map = {
 	},
 };
 
+#ifndef RTE_EXEC_ENV_WINDOWS
+/* *** SAVE_TRACE *** */
+
+struct cmd_save_trace_result {
+	cmdline_fixed_string_t save;
+};
+
+static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
+				  __rte_unused struct cmdline *cl,
+				  __rte_unused void *data)
+{
+	int rc;
+
+	rc = rte_trace_save();
+	if (rc)
+		printf("Save trace failed with error: %d\n", rc);
+	else
+		printf("Trace saved successfully\n");
+}
+
+static cmdline_parse_token_string_t cmd_save_trace_save =
+	TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save, "save_trace");
+
+static cmdline_parse_inst_t cmd_save_trace = {
+	.f = cmd_save_trace_parsed,
+	.data = NULL,
+	.help_str = "save_trace: save tracing buffer",
+	.tokens = {
+		(void *)&cmd_save_trace_save,
+		NULL,
+	},
+};
+#endif
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -12979,6 +13014,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
 	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
 	(cmdline_parse_inst_t *)&cmd_config_tx_affinity_map,
+#ifndef RTE_EXEC_ENV_WINDOWS
+	(cmdline_parse_inst_t *)&cmd_save_trace,
+#endif
 	NULL,
 };
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 1/5] app/testpmd: add trace save command Viacheslav Ovsiienko
@ 2023-06-13 16:58   ` Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/meson.build  |  1 +
 drivers/common/mlx5/mlx5_trace.c | 25 +++++++++++
 drivers/common/mlx5/mlx5_trace.h | 72 ++++++++++++++++++++++++++++++++
 drivers/common/mlx5/version.map  |  8 ++++
 4 files changed, 106 insertions(+)
 create mode 100644 drivers/common/mlx5/mlx5_trace.c
 create mode 100644 drivers/common/mlx5/mlx5_trace.h

diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 1eefc02f06..28bfbfa324 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -19,6 +19,7 @@ sources += files(
         'mlx5_common_mp.c',
         'mlx5_common_mr.c',
         'mlx5_malloc.c',
+        'mlx5_trace.c',
         'mlx5_common_pci.c',
         'mlx5_common_devx.c',
         'mlx5_common_utils.c',
diff --git a/drivers/common/mlx5/mlx5_trace.c b/drivers/common/mlx5/mlx5_trace.c
new file mode 100644
index 0000000000..b9f14413ad
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_trace_point_register.h>
+#include <mlx5_trace.h>
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
+
diff --git a/drivers/common/mlx5/mlx5_trace.h b/drivers/common/mlx5/mlx5_trace.h
new file mode 100644
index 0000000000..57512e654f
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_trace.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_PMD_MLX5_TRACE_H_
+#define RTE_PMD_MLX5_TRACE_H_
+
+/**
+ * @file
+ *
+ * API for mlx5 PMD trace support
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <mlx5_prm.h>
+#include <rte_mbuf.h>
+#include <rte_trace_point.h>
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_PMD_MLX5_TRACE_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..d0ec8571e6 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -158,5 +158,13 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	__rte_pmd_mlx5_trace_tx_entry;
+	__rte_pmd_mlx5_trace_tx_exit;
+	__rte_pmd_mlx5_trace_tx_wqe;
+	__rte_pmd_mlx5_trace_tx_wait;
+	__rte_pmd_mlx5_trace_tx_push;
+	__rte_pmd_mlx5_trace_tx_complete;
+
 	local: *;
 };
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 3/5] net/mlx5: add Tx datapath tracing
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 1/5] app/testpmd: add trace save command Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-06-13 16:58   ` Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

The patch adds tracing capability to Tx datapath.
To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.h   | 19 -------------------
 drivers/net/mlx5/mlx5_rxtx.h | 19 +++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c   |  9 +++++++++
 drivers/net/mlx5/mlx5_tx.h   | 25 +++++++++++++++++++++++--
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 52c35c83f8..ed912ffb99 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -376,25 +376,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..1fe9521dfc 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -232,6 +232,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..7f624de58e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -19,6 +19,8 @@
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_trace.h"
+#include "mlx5_rxtx.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +766,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1697,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1712,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1817,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1900,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2124,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2328,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2698,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2937,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2947,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +2993,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3208,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3271,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3313,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3355,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3725,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 4/5] net/mlx5: add comprehensive send completion trace
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2023-06-13 16:58   ` [PATCH v2 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-13 16:58   ` Viacheslav Ovsiienko
  2023-06-13 16:58   ` [PATCH v2 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  2023-06-20 12:00   ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
  5 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 7233c2c7fa..b54f3ccd9a 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 7f624de58e..9f29df280f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -728,6 +728,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -754,7 +802,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -763,8 +811,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3662,7 +3714,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2 5/5] net/mlx5: add Tx datapath trace analyzing script
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2023-06-13 16:58   ` [PATCH v2 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-06-13 16:58   ` Viacheslav Ovsiienko
  2023-06-20 12:00   ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
  5 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-13 16:58 UTC (permalink / raw)
  To: dev

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..c8fa63a7b9
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+'''
+Analyzing the mlx5 PMD datapath tracings
+'''
+import sys
+import argparse
+import pathlib
+import bt2
+
+PFX_TX     = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+tx_blst = {}                    # current Tx bursts per CPU
+tx_qlst = {}                    # active Tx queues per port/queue
+tx_wlst = {}                    # wait timestamp list per CPU
+
+class mlx5_queue(object):
+    def __init__(self):
+        self.done_burst = []    # completed bursts
+        self.wait_burst = []    # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        for txb in self.done_burst:
+            txb.log()
+
+
+class mlx5_mbuf(object):
+    def __init__(self):
+        self.wqe = 0            # wqe id
+        self.ptr = None         # first packet mbuf pointer
+        self.len = 0            # packet data length
+        self.nseg = 0           # number of segments
+
+    def log(self):
+        out = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out += " (%d segs)" % self.nseg
+        print(out)
+
+
+class mlx5_wqe(object):
+    def __init__(self):
+        self.mbuf = []          # list of mbufs in WQE
+        self.wait_ts = 0        # preceding wait/push timestamp
+        self.comp_ts = 0        # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        id = (self.opcode >> 8) & 0xFFFF
+        op = self.opcode & 0xFF
+        fl = self.opcode >> 24
+        out = "  %04X: " % id
+        if op == 0xF:
+            out += "WAIT"
+        elif op == 0x29:
+            out += "EMPW"
+        elif op == 0xE:
+            out += "TSO "
+        elif op == 0xA:
+            out += "SEND"
+        else:
+            out += "0x%02X" % op
+        if self.comp_ts != 0:
+            out += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out += " (%d)" % self.wait_ts
+        print(out)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    # return 0 if WQE in not completed
+    def comp(self, wqe_id, ts):
+        if self.comp_ts != 0:
+            return 1
+        id = (self.opcode >> 8) & 0xFFFF
+        if id > wqe_id:
+            id -= wqe_id
+            if id <= 0x8000:
+                return 0
+        else:
+            id = wqe_id - id
+            if id >= 0x8000:
+                return 0
+        self.comp_ts = ts
+        return 1
+
+
+class mlx5_burst(object):
+    def __init__(self):
+        self.wqes = []          # issued burst WQEs
+        self.done = 0           # number of sent/recv packets
+        self.req = 0            # requested number of packets
+        self.call_ts = 0        # burst routine invocation
+        self.done_ts = 0        # burst routine done
+        self.queue = None
+
+    def log(self):
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)" %
+                  (self.call_ts, port, queue, self.done, self.req))
+        else:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts in %u" %
+                  (self.call_ts, port, queue, self.done, self.req,
+                   self.done_ts - self.call_ts))
+        for wqe in self.wqes:
+            wqe.log()
+
+    # return 0 if not all of WQEs in burst completed
+    def comp(self, wqe_id, ts):
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, ts) == 0:
+                return 0
+        return 1
+
+
+def do_tx_entry(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = mlx5_burst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = mlx5_queue();
+        queue.pq_id = pq_id
+        tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = mlx5_wqe()
+    wqe.wait_ts = tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = mlx5_mbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg):
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg):
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg)
+    elif name == "exit":
+        do_tx_exit(msg)
+    elif name == "wqe":
+        do_tx_wqe(msg)
+    elif name == "wait":
+        do_tx_wait(msg)
+    elif name == "push":
+        do_tx_push(msg)
+    elif name == "complete":
+        do_tx_complete(msg)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name)
+        sys.exit(1)
+
+
+def do_log(msg_it):
+    for msg in msg_it:
+        if type(msg) is not bt2._EventMessageConst:
+            continue
+        event = msg.event
+        if event.name.startswith(PFX_TX):
+            do_tx(msg)
+        # Handling of other log event cathegories can be added here
+
+
+def do_print():
+    for pq_id in tx_qlst:
+        queue = tx_qlst.get(pq_id)
+        queue.log()
+
+
+def main(args):
+    parser = argparse.ArgumentParser()
+    parser.add_argument("path",
+                        nargs = 1,
+                        type = str,
+                        help = "input trace folder")
+    args = parser.parse_args()
+
+    msg_it = bt2.TraceCollectionMessageIterator(args.path)
+    do_log(msg_it)
+    do_print()
+    exit(0)
+
+if __name__ == "__main__":
+    main(sys.argv)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2023-06-13 16:58   ` [PATCH v2 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-06-20 12:00   ` Raslan Darawsheh
  2023-06-27  0:46     ` Thomas Monjalon
  5 siblings, 1 reply; 76+ messages in thread
From: Raslan Darawsheh @ 2023-06-20 12:00 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Hi,

> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, June 13, 2023 7:59 PM
> To: dev@dpdk.org
> Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> The mlx5 provides the send scheduling on specific moment of time,
> and for the related kind of applications it would be extremely useful
> to have extra debug information - when and how packets were scheduled
> and when the actual sending was completed by the NIC hardware (it helps
> application to track the internal delay issues).
> 
> Because the DPDK tx datapath API does not suppose getting any feedback
> from the driver and the feature looks like to be mlx5 specific, it seems
> to be reasonable to engage exisiting DPDK datapath tracing capability.
> 
> The work cycle is supposed to be:
>   - compile appplication with enabled tracing
>   - run application with EAL parameters configuring the tracing in mlx5
>     Tx datapath
>   - store the dump file with gathered tracing information
>   - run analyzing scrypt (in Python) to combine related events (packet
>     firing and completion) and see the data in human-readable view
> 
> Below is the detailed instruction "how to" with mlx5 NIC to gather
> all the debug data including the full timings information.
> 
> 
> 1. Build DPDK application with enabled datapath tracing
> 
> The meson option should be specified:
>    --enable_trace_fp=true
> 
> The c_args shoudl be specified:
>    -DALLOW_EXPERIMENTAL_API
> 
> The DPDK configuration examples:
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> 
> 
> 2. Configuring the NIC
> 
> If the sending completion timings are important the NIC should be configured
> to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings
> parameter
> should be configured to TRUE, for example with command (and with following
> FW/driver reset):
> 
>   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> REAL_TIME_CLOCK_ENABLE=1
> 
> 
> 3. Run DPDK application to gather the traces
> 
> EAL parameters controlling trace capability in runtime
> 
>   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
>                             with matching names at least "pmd.net.mlx5.tx"
>                             must be enabled to gather all events needed
>                             to analyze mlx5 Tx datapath and its timings.
>                             By default all tracepoints are disabled.
> 
>   --trace-dir=/var/log - trace storing directory
> 
>   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
>                                        per thread. The default is 1MB.
> 
>   --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.
> 
> 
> 4. Installing or Building Babeltrace2 Package
> 
> The gathered trace data can be analyzed with a developed Python script.
> To parse the trace, the data script uses the Babeltrace2 library.
> The package should be either installed or built from source code as
> shown below:
> 
>   git clone https://github.com/efficios/babeltrace.git
>   cd babeltrace
>   ./bootstrap
>   ./configure -help
>   ./configure --disable-api-doc --disable-man-pages
>               --disable-python-bindings-doc --enbale-python-plugins
>               --enable-python-binding
> 
> 5. Running the Analyzing Script
> 
> The analyzing script is located in the folder: ./drivers/net/mlx5/tools
> It requires Python3.6, Babeltrace2 packages and it takes the only parameter
> of trace data file. For example:
> 
>    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> 
> 
> 6. Interpreting the Script Output Data
> 
> All the timings are given in nanoseconds.
> The list of Tx (and coming Rx) bursts per port/queue is presented in the
> output.
> Each list element contains the list of built WQEs with specific opcodes, and
> each WQE contains the list of the encompassed packets to send.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> --
> v2: - comment addressed: "dump_trace" command is replaced with
> "save_trace"
>     - Windows build failure addressed, Windows does not support tracing
> 
> Viacheslav Ovsiienko (5):
>   app/testpmd: add trace save command
>   common/mlx5: introduce tracepoints for mlx5 drivers
>   net/mlx5: add Tx datapath tracing
>   net/mlx5: add comprehensive send completion trace
>   net/mlx5: add Tx datapath trace analyzing script
> 
>  app/test-pmd/cmdline.c               |  38 ++++
>  drivers/common/mlx5/meson.build      |   1 +
>  drivers/common/mlx5/mlx5_trace.c     |  25 +++
>  drivers/common/mlx5/mlx5_trace.h     |  72 +++++++
>  drivers/common/mlx5/version.map      |   8 +
>  drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
>  drivers/net/mlx5/mlx5_devx.c         |   8 +-
>  drivers/net/mlx5/mlx5_rx.h           |  19 --
>  drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
>  drivers/net/mlx5/mlx5_tx.c           |   9 +
>  drivers/net/mlx5/mlx5_tx.h           |  88 ++++++++-
>  drivers/net/mlx5/tools/mlx5_trace.py | 271
> +++++++++++++++++++++++++++
>  12 files changed, 537 insertions(+), 29 deletions(-)
>  create mode 100644 drivers/common/mlx5/mlx5_trace.c
>  create mode 100644 drivers/common/mlx5/mlx5_trace.h
>  create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py
> 
> --
> 2.18.1

Series applied to next-net-mlx,

Kindest regards
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-13 16:58   ` [PATCH v2 1/5] app/testpmd: add trace save command Viacheslav Ovsiienko
@ 2023-06-21 11:15     ` Ferruh Yigit
  2023-06-23  8:00       ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-21 11:15 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev, Aman Singh; +Cc: Jerin Jacob Kollanukkaran

On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
> The "save_trace" CLI command is added to trigger
> saving the trace dumps to the trace directory.
> 

Hi Viacheslav,

Trace is already saved when dpdk application terminated, I guess this is
to save the trace before exiting the application, what is the use case
for this, can you please detail in the commit log.

And what happens if this is called multiple times, or what happens on
the application exit, will it overwrite the file or fail?
Again please explain in the commit log.

> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 

Can you please update documentation too?

> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index a15a442a06..db71ce2028 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -39,6 +39,7 @@
>  #include <rte_gro.h>
>  #endif
>  #include <rte_mbuf_dyn.h>
> +#include <rte_trace.h>
>  
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t cmd_config_tx_affinity_map = {
>  	},
>  };
>  
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +/* *** SAVE_TRACE *** */
> +
> +struct cmd_save_trace_result {
> +	cmdline_fixed_string_t save;
> +};
> +
> +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	int rc;
> +
> +	rc = rte_trace_save();
> +	if (rc)
> +		printf("Save trace failed with error: %d\n", rc);
> +	else
> +		printf("Trace saved successfully\n");
> +}
> +
> +static cmdline_parse_token_string_t cmd_save_trace_save =
> +	TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save, "save_trace");
> +

We have dump_* commands, what do you think to have 'dump_trace' command
for this?


>  


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-21 11:15     ` Ferruh Yigit
@ 2023-06-23  8:00       ` Slava Ovsiienko
  2023-06-23 11:52         ` Ferruh Yigit
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-23  8:00 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Aman Singh; +Cc: Jerin Jacob Kollanukkaran

Hi, Ferruh

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, June 21, 2023 2:16 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Aman Singh
> <aman.deep.singh@intel.com>
> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
> 
> On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
> > The "save_trace" CLI command is added to trigger saving the trace
> > dumps to the trace directory.
> >
> 
> Hi Viacheslav,
> 
> Trace is already saved when dpdk application terminated, I guess this is to
> save the trace before exiting the application, what is the use case for this, can
> you please detail in the commit log.

OK, will update the commit log. The command "save_trace" is useful in some
dynamic debug scenarios to save the trace without restarting the entire application.

> 
> And what happens if this is called multiple times, or what happens on the
> application exit, will it overwrite the file or fail?
It overwrites.

> Again please explain in the commit log.
Sure, will do.

> 
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > ---
> >  app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> 
> Can you please update documentation too?
> 
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > a15a442a06..db71ce2028 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -39,6 +39,7 @@
> >  #include <rte_gro.h>
> >  #endif
> >  #include <rte_mbuf_dyn.h>
> > +#include <rte_trace.h>
> >
> >  #include <cmdline_rdline.h>
> >  #include <cmdline_parse.h>
> > @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t
> cmd_config_tx_affinity_map = {
> >  	},
> >  };
> >
> > +#ifndef RTE_EXEC_ENV_WINDOWS
> > +/* *** SAVE_TRACE *** */
> > +
> > +struct cmd_save_trace_result {
> > +	cmdline_fixed_string_t save;
> > +};
> > +
> > +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
> > +				  __rte_unused struct cmdline *cl,
> > +				  __rte_unused void *data)
> > +{
> > +	int rc;
> > +
> > +	rc = rte_trace_save();
> > +	if (rc)
> > +		printf("Save trace failed with error: %d\n", rc);
> > +	else
> > +		printf("Trace saved successfully\n"); }
> > +
> > +static cmdline_parse_token_string_t cmd_save_trace_save =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save,
> > +"save_trace");
> > +
> 
> We have dump_* commands, what do you think to have 'dump_trace'
> command for this?
It was initially (in v1) with "dump_trace" command.
And there is the comment by Jerin:
https://inbox.dpdk.org/dev/CALBAE1Of79a_jHnFT3KX--Enhud-h5RzL02TMQBsmoW721ds7A@mail.gmail.com/#t

So, I have changed to "save_trace". I have no strong opinion about command name, any allowing trace save is OK for me.

With best regards,
Slava


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-23  8:00       ` Slava Ovsiienko
@ 2023-06-23 11:52         ` Ferruh Yigit
  2023-06-23 12:03           ` Jerin Jacob
  0 siblings, 1 reply; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-23 11:52 UTC (permalink / raw)
  To: Slava Ovsiienko, Aman Singh, Jerin Jacob Kollanukkaran; +Cc: dev

On 6/23/2023 9:00 AM, Slava Ovsiienko wrote:
> Hi, Ferruh
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, June 21, 2023 2:16 PM
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Aman Singh
>> <aman.deep.singh@intel.com>
>> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
>>
>> On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
>>> The "save_trace" CLI command is added to trigger saving the trace
>>> dumps to the trace directory.
>>>
>>
>> Hi Viacheslav,
>>
>> Trace is already saved when dpdk application terminated, I guess this is to
>> save the trace before exiting the application, what is the use case for this, can
>> you please detail in the commit log.
> 
> OK, will update the commit log. The command "save_trace" is useful in some
> dynamic debug scenarios to save the trace without restarting the entire application.
> 
>>
>> And what happens if this is called multiple times, or what happens on the
>> application exit, will it overwrite the file or fail?
> It overwrites.
> 
>> Again please explain in the commit log.
> Sure, will do.
> 
>>
>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>>> ---
>>>  app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 38 insertions(+)
>>>
>>
>> Can you please update documentation too?
>>
>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
>>> a15a442a06..db71ce2028 100644
>>> --- a/app/test-pmd/cmdline.c
>>> +++ b/app/test-pmd/cmdline.c
>>> @@ -39,6 +39,7 @@
>>>  #include <rte_gro.h>
>>>  #endif
>>>  #include <rte_mbuf_dyn.h>
>>> +#include <rte_trace.h>
>>>
>>>  #include <cmdline_rdline.h>
>>>  #include <cmdline_parse.h>
>>> @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t
>> cmd_config_tx_affinity_map = {
>>>  	},
>>>  };
>>>
>>> +#ifndef RTE_EXEC_ENV_WINDOWS
>>> +/* *** SAVE_TRACE *** */
>>> +
>>> +struct cmd_save_trace_result {
>>> +	cmdline_fixed_string_t save;
>>> +};
>>> +
>>> +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
>>> +				  __rte_unused struct cmdline *cl,
>>> +				  __rte_unused void *data)
>>> +{
>>> +	int rc;
>>> +
>>> +	rc = rte_trace_save();
>>> +	if (rc)
>>> +		printf("Save trace failed with error: %d\n", rc);
>>> +	else
>>> +		printf("Trace saved successfully\n"); }
>>> +
>>> +static cmdline_parse_token_string_t cmd_save_trace_save =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save,
>>> +"save_trace");
>>> +
>>
>> We have dump_* commands, what do you think to have 'dump_trace'
>> command for this?
> It was initially (in v1) with "dump_trace" command.
> And there is the comment by Jerin:
> https://inbox.dpdk.org/dev/CALBAE1Of79a_jHnFT3KX--Enhud-h5RzL02TMQBsmoW721ds7A@mail.gmail.com/#t
> 
> So, I have changed to "save_trace". I have no strong opinion about command name, any allowing trace save is OK for me.
> 

Ah, I missed that.


@Jerin,
I just saw your comment, agree more exact action can be 'save' but
'dump' also describes enough.
Since there are existing 'dump_*' commands, it makes command more
intuitive and easy to remember.

As an active user of testpmd myself, I am finding it hard to
remember/find the command I need as number of commands increased. That
is why I am paying extra attention to have more hierarchical, consistent
and intuitive commands.

For me "dump_trace" works better in that manner, what do you think, do
you have strong opinion on 'save_trace'?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-23 11:52         ` Ferruh Yigit
@ 2023-06-23 12:03           ` Jerin Jacob
  2023-06-23 12:14             ` Slava Ovsiienko
  2023-06-23 12:23             ` Ferruh Yigit
  0 siblings, 2 replies; 76+ messages in thread
From: Jerin Jacob @ 2023-06-23 12:03 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Slava Ovsiienko, Aman Singh, Jerin Jacob Kollanukkaran, dev

On Fri, Jun 23, 2023 at 5:23 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 6/23/2023 9:00 AM, Slava Ovsiienko wrote:
> > Hi, Ferruh
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, June 21, 2023 2:16 PM
> >> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Aman Singh
> >> <aman.deep.singh@intel.com>
> >> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> >> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
> >>
> >> On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
> >>> The "save_trace" CLI command is added to trigger saving the trace
> >>> dumps to the trace directory.
> >>>
> >>
> >> Hi Viacheslav,
> >>
> >> Trace is already saved when dpdk application terminated, I guess this is to
> >> save the trace before exiting the application, what is the use case for this, can
> >> you please detail in the commit log.
> >
> > OK, will update the commit log. The command "save_trace" is useful in some
> > dynamic debug scenarios to save the trace without restarting the entire application.
> >
> >>
> >> And what happens if this is called multiple times, or what happens on the
> >> application exit, will it overwrite the file or fail?
> > It overwrites.
> >
> >> Again please explain in the commit log.
> > Sure, will do.
> >
> >>
> >>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> >>> ---
> >>>  app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 38 insertions(+)
> >>>
> >>
> >> Can you please update documentation too?
> >>
> >>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> >>> a15a442a06..db71ce2028 100644
> >>> --- a/app/test-pmd/cmdline.c
> >>> +++ b/app/test-pmd/cmdline.c
> >>> @@ -39,6 +39,7 @@
> >>>  #include <rte_gro.h>
> >>>  #endif
> >>>  #include <rte_mbuf_dyn.h>
> >>> +#include <rte_trace.h>
> >>>
> >>>  #include <cmdline_rdline.h>
> >>>  #include <cmdline_parse.h>
> >>> @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t
> >> cmd_config_tx_affinity_map = {
> >>>     },
> >>>  };
> >>>
> >>> +#ifndef RTE_EXEC_ENV_WINDOWS
> >>> +/* *** SAVE_TRACE *** */
> >>> +
> >>> +struct cmd_save_trace_result {
> >>> +   cmdline_fixed_string_t save;
> >>> +};
> >>> +
> >>> +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
> >>> +                             __rte_unused struct cmdline *cl,
> >>> +                             __rte_unused void *data)
> >>> +{
> >>> +   int rc;
> >>> +
> >>> +   rc = rte_trace_save();
> >>> +   if (rc)
> >>> +           printf("Save trace failed with error: %d\n", rc);
> >>> +   else
> >>> +           printf("Trace saved successfully\n"); }
> >>> +
> >>> +static cmdline_parse_token_string_t cmd_save_trace_save =
> >>> +   TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save,
> >>> +"save_trace");
> >>> +
> >>
> >> We have dump_* commands, what do you think to have 'dump_trace'
> >> command for this?
> > It was initially (in v1) with "dump_trace" command.
> > And there is the comment by Jerin:
> > https://inbox.dpdk.org/dev/CALBAE1Of79a_jHnFT3KX--Enhud-h5RzL02TMQBsmoW721ds7A@mail.gmail.com/#t
> >
> > So, I have changed to "save_trace". I have no strong opinion about command name, any allowing trace save is OK for me.
> >
>
> Ah, I missed that.
>
>
> @Jerin,
> I just saw your comment, agree more exact action can be 'save' but
> 'dump' also describes enough.
> Since there are existing 'dump_*' commands, it makes command more
> intuitive and easy to remember.
>
> As an active user of testpmd myself, I am finding it hard to
> remember/find the command I need as number of commands increased. That
> is why I am paying extra attention to have more hierarchical, consistent
> and intuitive commands.
>
> For me "dump_trace" works better in that manner, what do you think, do
> you have strong opinion on 'save_trace'?

dump_* commands dumping on stdout or FILE.
Trace is mostly saving "current trace buffer" it and internally it
figure out the FILE.
But no strong opinion, if testpmd user thinks "dump" is better.


>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-23 12:03           ` Jerin Jacob
@ 2023-06-23 12:14             ` Slava Ovsiienko
  2023-06-23 12:23             ` Ferruh Yigit
  1 sibling, 0 replies; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-23 12:14 UTC (permalink / raw)
  To: Jerin Jacob, Ferruh Yigit; +Cc: Aman Singh, Jerin Jacob Kollanukkaran, dev

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, June 23, 2023 3:04 PM
> To: Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Aman Singh
> <aman.deep.singh@intel.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
> 
> On Fri, Jun 23, 2023 at 5:23 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> > On 6/23/2023 9:00 AM, Slava Ovsiienko wrote:
> > > Hi, Ferruh
> > >
> > >> -----Original Message-----
> > >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > >> Sent: Wednesday, June 21, 2023 2:16 PM
> > >> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Aman
> > >> Singh <aman.deep.singh@intel.com>
> > >> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > >> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
> > >>
> > >> On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
> > >>> The "save_trace" CLI command is added to trigger saving the trace
> > >>> dumps to the trace directory.
> > >>>
> > >>
> > >> Hi Viacheslav,
> > >>
> > >> Trace is already saved when dpdk application terminated, I guess
> > >> this is to save the trace before exiting the application, what is
> > >> the use case for this, can you please detail in the commit log.
> > >
> > > OK, will update the commit log. The command "save_trace" is useful
> > > in some dynamic debug scenarios to save the trace without restarting the
> entire application.
> > >
> > >>
> > >> And what happens if this is called multiple times, or what happens
> > >> on the application exit, will it overwrite the file or fail?
> > > It overwrites.
> > >
> > >> Again please explain in the commit log.
> > > Sure, will do.
> > >
> > >>
> > >>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > >>> ---
> > >>>  app/test-pmd/cmdline.c | 38
> > >>> ++++++++++++++++++++++++++++++++++++++
> > >>>  1 file changed, 38 insertions(+)
> > >>>
> > >>
> > >> Can you please update documentation too?
> > >>
> > >>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > >>> a15a442a06..db71ce2028 100644
> > >>> --- a/app/test-pmd/cmdline.c
> > >>> +++ b/app/test-pmd/cmdline.c
> > >>> @@ -39,6 +39,7 @@
> > >>>  #include <rte_gro.h>
> > >>>  #endif
> > >>>  #include <rte_mbuf_dyn.h>
> > >>> +#include <rte_trace.h>
> > >>>
> > >>>  #include <cmdline_rdline.h>
> > >>>  #include <cmdline_parse.h>
> > >>> @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t
> > >> cmd_config_tx_affinity_map = {
> > >>>     },
> > >>>  };
> > >>>
> > >>> +#ifndef RTE_EXEC_ENV_WINDOWS
> > >>> +/* *** SAVE_TRACE *** */
> > >>> +
> > >>> +struct cmd_save_trace_result {
> > >>> +   cmdline_fixed_string_t save;
> > >>> +};
> > >>> +
> > >>> +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
> > >>> +                             __rte_unused struct cmdline *cl,
> > >>> +                             __rte_unused void *data) {
> > >>> +   int rc;
> > >>> +
> > >>> +   rc = rte_trace_save();
> > >>> +   if (rc)
> > >>> +           printf("Save trace failed with error: %d\n", rc);
> > >>> +   else
> > >>> +           printf("Trace saved successfully\n"); }
> > >>> +
> > >>> +static cmdline_parse_token_string_t cmd_save_trace_save =
> > >>> +   TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save,
> > >>> +"save_trace");
> > >>> +
> > >>
> > >> We have dump_* commands, what do you think to have 'dump_trace'
> > >> command for this?
> > > It was initially (in v1) with "dump_trace" command.
> > > And there is the comment by Jerin:
> > > https://inbox.dpdk.org/dev/CALBAE1Of79a_jHnFT3KX--Enhud-
> h5RzL02TMQBs
> > > moW721ds7A@mail.gmail.com/#t
> > >
> > > So, I have changed to "save_trace". I have no strong opinion about
> command name, any allowing trace save is OK for me.
> > >
> >
> > Ah, I missed that.
> >
> >
> > @Jerin,
> > I just saw your comment, agree more exact action can be 'save' but
> > 'dump' also describes enough.
> > Since there are existing 'dump_*' commands, it makes command more
> > intuitive and easy to remember.
> >
> > As an active user of testpmd myself, I am finding it hard to
> > remember/find the command I need as number of commands increased.
> That
> > is why I am paying extra attention to have more hierarchical,
> > consistent and intuitive commands.
> >
> > For me "dump_trace" works better in that manner, what do you think, do
> > you have strong opinion on 'save_trace'?
> 
> dump_* commands dumping on stdout or FILE.
> Trace is mostly saving "current trace buffer" it and internally it figure out the
> FILE.
> But no strong opinion, if testpmd user thinks "dump" is better.

I think "dump_trace" would be more intuitive and do no not overwhelm the testpmd code
with supporting new "save_trace". So, I vote to revert to "dump_trace", don't you mind?

With best regards,
Slava


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 1/5] app/testpmd: add trace save command
  2023-06-23 12:03           ` Jerin Jacob
  2023-06-23 12:14             ` Slava Ovsiienko
@ 2023-06-23 12:23             ` Ferruh Yigit
  1 sibling, 0 replies; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-23 12:23 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Slava Ovsiienko, Aman Singh, Jerin Jacob Kollanukkaran, dev

On 6/23/2023 1:03 PM, Jerin Jacob wrote:
> On Fri, Jun 23, 2023 at 5:23 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 6/23/2023 9:00 AM, Slava Ovsiienko wrote:
>>> Hi, Ferruh
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Wednesday, June 21, 2023 2:16 PM
>>>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Aman Singh
>>>> <aman.deep.singh@intel.com>
>>>> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>>>> Subject: Re: [PATCH v2 1/5] app/testpmd: add trace save command
>>>>
>>>> On 6/13/2023 5:58 PM, Viacheslav Ovsiienko wrote:
>>>>> The "save_trace" CLI command is added to trigger saving the trace
>>>>> dumps to the trace directory.
>>>>>
>>>>
>>>> Hi Viacheslav,
>>>>
>>>> Trace is already saved when dpdk application terminated, I guess this is to
>>>> save the trace before exiting the application, what is the use case for this, can
>>>> you please detail in the commit log.
>>>
>>> OK, will update the commit log. The command "save_trace" is useful in some
>>> dynamic debug scenarios to save the trace without restarting the entire application.
>>>
>>>>
>>>> And what happens if this is called multiple times, or what happens on the
>>>> application exit, will it overwrite the file or fail?
>>> It overwrites.
>>>
>>>> Again please explain in the commit log.
>>> Sure, will do.
>>>
>>>>
>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>>>>> ---
>>>>>  app/test-pmd/cmdline.c | 38 ++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 38 insertions(+)
>>>>>
>>>>
>>>> Can you please update documentation too?
>>>>
>>>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
>>>>> a15a442a06..db71ce2028 100644
>>>>> --- a/app/test-pmd/cmdline.c
>>>>> +++ b/app/test-pmd/cmdline.c
>>>>> @@ -39,6 +39,7 @@
>>>>>  #include <rte_gro.h>
>>>>>  #endif
>>>>>  #include <rte_mbuf_dyn.h>
>>>>> +#include <rte_trace.h>
>>>>>
>>>>>  #include <cmdline_rdline.h>
>>>>>  #include <cmdline_parse.h>
>>>>> @@ -12745,6 +12746,40 @@ static cmdline_parse_inst_t
>>>> cmd_config_tx_affinity_map = {
>>>>>     },
>>>>>  };
>>>>>
>>>>> +#ifndef RTE_EXEC_ENV_WINDOWS
>>>>> +/* *** SAVE_TRACE *** */
>>>>> +
>>>>> +struct cmd_save_trace_result {
>>>>> +   cmdline_fixed_string_t save;
>>>>> +};
>>>>> +
>>>>> +static void cmd_save_trace_parsed(__rte_unused void *parsed_result,
>>>>> +                             __rte_unused struct cmdline *cl,
>>>>> +                             __rte_unused void *data)
>>>>> +{
>>>>> +   int rc;
>>>>> +
>>>>> +   rc = rte_trace_save();
>>>>> +   if (rc)
>>>>> +           printf("Save trace failed with error: %d\n", rc);
>>>>> +   else
>>>>> +           printf("Trace saved successfully\n"); }
>>>>> +
>>>>> +static cmdline_parse_token_string_t cmd_save_trace_save =
>>>>> +   TOKEN_STRING_INITIALIZER(struct cmd_save_trace_result, save,
>>>>> +"save_trace");
>>>>> +
>>>>
>>>> We have dump_* commands, what do you think to have 'dump_trace'
>>>> command for this?
>>> It was initially (in v1) with "dump_trace" command.
>>> And there is the comment by Jerin:
>>> https://inbox.dpdk.org/dev/CALBAE1Of79a_jHnFT3KX--Enhud-h5RzL02TMQBsmoW721ds7A@mail.gmail.com/#t
>>>
>>> So, I have changed to "save_trace". I have no strong opinion about command name, any allowing trace save is OK for me.
>>>
>>
>> Ah, I missed that.
>>
>>
>> @Jerin,
>> I just saw your comment, agree more exact action can be 'save' but
>> 'dump' also describes enough.
>> Since there are existing 'dump_*' commands, it makes command more
>> intuitive and easy to remember.
>>
>> As an active user of testpmd myself, I am finding it hard to
>> remember/find the command I need as number of commands increased. That
>> is why I am paying extra attention to have more hierarchical, consistent
>> and intuitive commands.
>>
>> For me "dump_trace" works better in that manner, what do you think, do
>> you have strong opinion on 'save_trace'?
> 
> dump_* commands dumping on stdout or FILE.
> Trace is mostly saving "current trace buffer" it and internally it
> figure out the FILE.
>

Agree that 'save' can be more accurate, but 'dump_*' is more consistent.
Saving trace buffer to a file, or dumping content of trace buffer to a
file, looks close enough to me.

> But no strong opinion, if testpmd user thinks "dump" is better.
> 

OK, lets continue with 'dump_trace'.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH] app/testpmd: add trace dump command
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (6 preceding siblings ...)
  2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-26 11:06 ` Viacheslav Ovsiienko
  2023-06-26 11:07 ` [PATCH v3] " Viacheslav Ovsiienko
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-26 11:06 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, jerinj

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

The tracing data are saved according to the EAL configuration
(explicit --trace-dir EAL command line parameter alters
the target folder to save). The result dump folder gets the name
like rte-YYYY-MM-DD-xx-HH-MM-SS format.

This command is useful to get the trace date without exiting
testpmd application and to get the multiple dumps to observe
the situation in dynamics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c6690887d3..70b598c64e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8372,6 +8372,8 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_lcore_dump(stdout);
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
 }
 
 static cmdline_parse_token_string_t cmd_dump_dump =
@@ -8384,7 +8386,8 @@ static cmdline_parse_token_string_t cmd_dump_dump =
 		"dump_mempool#"
 		"dump_devargs#"
 		"dump_lcores#"
-		"dump_log_types");
+		"dump_log_types#"
+		"dump_trace");
 
 static cmdline_parse_inst_t cmd_dump = {
 	.f = cmd_dump_parsed,  /* function to call */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3] app/testpmd: add trace dump command
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (7 preceding siblings ...)
  2023-06-26 11:06 ` [PATCH] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-06-26 11:07 ` Viacheslav Ovsiienko
  2023-06-26 11:57 ` [PATCH v4] " Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-26 11:07 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, jerinj

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

The tracing data are saved according to the EAL configuration
(explicit --trace-dir EAL command line parameter alters
the target folder to save). The result dump folder gets the name
like rte-YYYY-MM-DD-xx-HH-MM-SS format.

This command is useful to get the trace date without exiting
testpmd application and to get the multiple dumps to observe
the situation in dynamics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c6690887d3..70b598c64e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8372,6 +8372,8 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_lcore_dump(stdout);
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
 }
 
 static cmdline_parse_token_string_t cmd_dump_dump =
@@ -8384,7 +8386,8 @@ static cmdline_parse_token_string_t cmd_dump_dump =
 		"dump_mempool#"
 		"dump_devargs#"
 		"dump_lcores#"
-		"dump_log_types");
+		"dump_log_types#"
+		"dump_trace");
 
 static cmdline_parse_inst_t cmd_dump = {
 	.f = cmd_dump_parsed,  /* function to call */
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4] app/testpmd: add trace dump command
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (8 preceding siblings ...)
  2023-06-26 11:07 ` [PATCH v3] " Viacheslav Ovsiienko
@ 2023-06-26 11:57 ` Viacheslav Ovsiienko
  2023-06-27 11:34   ` Ferruh Yigit
  2023-06-27 13:09 ` [PATCH v5] app/testpmd: add trace dump command Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-26 11:57 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, jerinj

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

The tracing data are saved according to the EAL configuration
(explicit --trace-dir EAL command line parameter alters
the target folder to save). The result dump folder gets the name
like rte-YYYY-MM-DD-xx-HH-MM-SS format.

This command is useful to get the trace date without exiting
testpmd application and to get the multiple dumps to observe
the situation in dynamics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--

v1: https://inbox.dpdk.org/dev/20230609152847.32496-2-viacheslavo@nvidia.com
v2: https://inbox.dpdk.org/dev/20230613165845.19109-2-viacheslavo@nvidia.com
    - changed to save_trace command
    - Windows compilation check added

v3: https://inbox.dpdk.org/dev/20230626110734.14126-1-viacheslavo@nvidia.com
    - reverted to "dump_trace" command

v4: - added missed header file include
    - missed #ifdef added for Windows compilation (no trace support
      for Windows)
---
 app/test-pmd/cmdline.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 5da38b0bb4..b82763c65d 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -39,6 +39,7 @@
 #include <rte_gro.h>
 #endif
 #include <rte_mbuf_dyn.h>
+#include <rte_trace.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -8371,10 +8372,17 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_lcore_dump(stdout);
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
+#ifndef RTE_EXEC_ENV_WINDOWS
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
+#endif		
 }
 
 static cmdline_parse_token_string_t cmd_dump_dump =
 	TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
+#ifndef RTE_EXEC_ENV_WINDOWS
+		"dump_trace#"
+#endif
 		"dump_physmem#"
 		"dump_memzone#"
 		"dump_socket_mem#"
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-13 16:01           ` Jerin Jacob
@ 2023-06-27  0:39             ` Thomas Monjalon
  2023-06-27  6:15               ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27  0:39 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev, Jerin Jacob, rasland

13/06/2023 18:01, Jerin Jacob:
> On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko <viacheslavo@nvidia.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > <..snip..>
> > > > > >
> > > > > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > > > > >         mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
> > > > > > +
> > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > >
> > > > > No need to expose these symbols. It is getting removed from rest of DPDK.
> > > > > Application can do rte_trace_lookup() to get this address.
> > > > >
> > > > >
> > > > It is not for application, it is for PMD itself, w/o exposing the symbols build
> > > failed.
> > >
> > > PMD is implementing this trace endpoints, not consuming this trace point.
> > > Right? If so, Why to expose these symbols?
> >
> > As far as understand:
> > The tracepoint routines are defined in dedicated common/mlx5_trace.c file.
> > The tx_burst in mlx5 is implemented as template in header file, and this
> > template is used in multiple .c files under net/mlx5 filder.
> > So, common/mlx5 should expose its symbols to net/mlx5 to allow successful
> > linkage.
> 
> OK. I missed the fact the these are in common code and net driver is
> depened on that.
> So changes makes sense.

It does not make sense to me.
These are tracepoints for the ethdev driver.
Why declaring them in the common library?




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-20 12:00   ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
@ 2023-06-27  0:46     ` Thomas Monjalon
  2023-06-27 11:24       ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27  0:46 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev, Raslan Darawsheh, rjarry, jerinj

20/06/2023 14:00, Raslan Darawsheh:
> Hi,
> 
> > -----Original Message-----
> > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > Sent: Tuesday, June 13, 2023 7:59 PM
> > To: dev@dpdk.org
> > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > 
> > The mlx5 provides the send scheduling on specific moment of time,
> > and for the related kind of applications it would be extremely useful
> > to have extra debug information - when and how packets were scheduled
> > and when the actual sending was completed by the NIC hardware (it helps
> > application to track the internal delay issues).
> > 
> > Because the DPDK tx datapath API does not suppose getting any feedback
> > from the driver and the feature looks like to be mlx5 specific, it seems
> > to be reasonable to engage exisiting DPDK datapath tracing capability.
> > 
> > The work cycle is supposed to be:
> >   - compile appplication with enabled tracing
> >   - run application with EAL parameters configuring the tracing in mlx5
> >     Tx datapath
> >   - store the dump file with gathered tracing information
> >   - run analyzing scrypt (in Python) to combine related events (packet
> >     firing and completion) and see the data in human-readable view
> > 
> > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > all the debug data including the full timings information.
> > 
> > 
> > 1. Build DPDK application with enabled datapath tracing
> > 
> > The meson option should be specified:
> >    --enable_trace_fp=true
> > 
> > The c_args shoudl be specified:
> >    -DALLOW_EXPERIMENTAL_API
> > 
> > The DPDK configuration examples:
> > 
> >   meson configure --buildtype=debug -Denable_trace_fp=true
> >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=debug -Denable_trace_fp=true
> >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=release -Denable_trace_fp=true
> >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> > 
> >   meson configure --buildtype=release -Denable_trace_fp=true
> >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > 
> > 
> > 2. Configuring the NIC
> > 
> > If the sending completion timings are important the NIC should be configured
> > to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings
> > parameter
> > should be configured to TRUE, for example with command (and with following
> > FW/driver reset):
> > 
> >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > REAL_TIME_CLOCK_ENABLE=1
> > 
> > 
> > 3. Run DPDK application to gather the traces
> > 
> > EAL parameters controlling trace capability in runtime
> > 
> >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> >                             with matching names at least "pmd.net.mlx5.tx"
> >                             must be enabled to gather all events needed
> >                             to analyze mlx5 Tx datapath and its timings.
> >                             By default all tracepoints are disabled.
> > 
> >   --trace-dir=/var/log - trace storing directory
> > 
> >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> >                                        per thread. The default is 1MB.
> > 
> >   --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.
> > 
> > 
> > 4. Installing or Building Babeltrace2 Package
> > 
> > The gathered trace data can be analyzed with a developed Python script.
> > To parse the trace, the data script uses the Babeltrace2 library.
> > The package should be either installed or built from source code as
> > shown below:
> > 
> >   git clone https://github.com/efficios/babeltrace.git
> >   cd babeltrace
> >   ./bootstrap
> >   ./configure -help
> >   ./configure --disable-api-doc --disable-man-pages
> >               --disable-python-bindings-doc --enbale-python-plugins
> >               --enable-python-binding
> > 
> > 5. Running the Analyzing Script
> > 
> > The analyzing script is located in the folder: ./drivers/net/mlx5/tools
> > It requires Python3.6, Babeltrace2 packages and it takes the only parameter
> > of trace data file. For example:
> > 
> >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > 
> > 
> > 6. Interpreting the Script Output Data
> > 
> > All the timings are given in nanoseconds.
> > The list of Tx (and coming Rx) bursts per port/queue is presented in the
> > output.
> > Each list element contains the list of built WQEs with specific opcodes, and
> > each WQE contains the list of the encompassed packets to send.

This information should be in the documentation.

I think we should request a review of the Python script from people familiar with tracing
and from people more familiar with Python scripting for user tools.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  0:39             ` Thomas Monjalon
@ 2023-06-27  6:15               ` Slava Ovsiienko
  2023-06-27  7:28                 ` Thomas Monjalon
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-27  6:15 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL); +Cc: dev, Jerin Jacob, Raslan Darawsheh

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 27, 2023 3:40 AM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
> 
> 13/06/2023 18:01, Jerin Jacob:
> > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko <viacheslavo@nvidia.com>
> wrote:
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > <..snip..>
> > > > > > >
> > > > > > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > WINDOWS_NO_EXPORT
> > > > > > > +
> > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > >
> > > > > > No need to expose these symbols. It is getting removed from rest of
> DPDK.
> > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > >
> > > > > >
> > > > > It is not for application, it is for PMD itself, w/o exposing
> > > > > the symbols build
> > > > failed.
> > > >
> > > > PMD is implementing this trace endpoints, not consuming this trace
> point.
> > > > Right? If so, Why to expose these symbols?
> > >
> > > As far as understand:
> > > The tracepoint routines are defined in dedicated common/mlx5_trace.c
> file.
> > > The tx_burst in mlx5 is implemented as template in header file, and
> > > this template is used in multiple .c files under net/mlx5 filder.
> > > So, common/mlx5 should expose its symbols to net/mlx5 to allow
> > > successful linkage.
> >
> > OK. I missed the fact the these are in common code and net driver is
> > depened on that.
> > So changes makes sense.
> 
> It does not make sense to me.
> These are tracepoints for the ethdev driver.
> Why declaring them in the common library?

Just to gather all mlx5 traces in the single file, to see all available tracing caps in single view.

With best regards,
Slava
 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  6:15               ` Slava Ovsiienko
@ 2023-06-27  7:28                 ` Thomas Monjalon
  2023-06-27  8:19                   ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27  7:28 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev, Jerin Jacob, Raslan Darawsheh

27/06/2023 08:15, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 13/06/2023 18:01, Jerin Jacob:
> > > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko <viacheslavo@nvidia.com>
> > wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > > <viacheslavo@nvidia.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > <..snip..>
> > > > > > > >
> > > > > > > >         mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
> > > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > > WINDOWS_NO_EXPORT
> > > > > > > > +
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > > >
> > > > > > > No need to expose these symbols. It is getting removed from rest of
> > DPDK.
> > > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > > >
> > > > > > >
> > > > > > It is not for application, it is for PMD itself, w/o exposing
> > > > > > the symbols build
> > > > > failed.
> > > > >
> > > > > PMD is implementing this trace endpoints, not consuming this trace
> > point.
> > > > > Right? If so, Why to expose these symbols?
> > > >
> > > > As far as understand:
> > > > The tracepoint routines are defined in dedicated common/mlx5_trace.c
> > file.
> > > > The tx_burst in mlx5 is implemented as template in header file, and
> > > > this template is used in multiple .c files under net/mlx5 filder.
> > > > So, common/mlx5 should expose its symbols to net/mlx5 to allow
> > > > successful linkage.
> > >
> > > OK. I missed the fact the these are in common code and net driver is
> > > depened on that.
> > > So changes makes sense.
> > 
> > It does not make sense to me.
> > These are tracepoints for the ethdev driver.
> > Why declaring them in the common library?
> 
> Just to gather all mlx5 traces in the single file, to see all available tracing caps in single view.

Better to not export them.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  7:28                 ` Thomas Monjalon
@ 2023-06-27  8:19                   ` Slava Ovsiienko
  2023-06-27  9:33                     ` Thomas Monjalon
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-27  8:19 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL); +Cc: dev, Jerin Jacob, Raslan Darawsheh

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 27, 2023 10:29 AM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
> 
> 27/06/2023 08:15, Slava Ovsiienko:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 13/06/2023 18:01, Jerin Jacob:
> > > > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>
> > > wrote:
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > > > <viacheslavo@nvidia.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > <..snip..>
> > > > > > > > >
> > > > > > > > >         mlx5_os_interrupt_handler_create; #
> WINDOWS_NO_EXPORT
> > > > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > > > WINDOWS_NO_EXPORT
> > > > > > > > > +
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > > > >
> > > > > > > > No need to expose these symbols. It is getting removed
> > > > > > > > from rest of
> > > DPDK.
> > > > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > > > >
> > > > > > > >
> > > > > > > It is not for application, it is for PMD itself, w/o
> > > > > > > exposing the symbols build
> > > > > > failed.
> > > > > >
> > > > > > PMD is implementing this trace endpoints, not consuming this
> > > > > > trace
> > > point.
> > > > > > Right? If so, Why to expose these symbols?
> > > > >
> > > > > As far as understand:
> > > > > The tracepoint routines are defined in dedicated
> > > > > common/mlx5_trace.c
> > > file.
> > > > > The tx_burst in mlx5 is implemented as template in header file,
> > > > > and this template is used in multiple .c files under net/mlx5 filder.
> > > > > So, common/mlx5 should expose its symbols to net/mlx5 to allow
> > > > > successful linkage.
> > > >
> > > > OK. I missed the fact the these are in common code and net driver
> > > > is depened on that.
> > > > So changes makes sense.
> > >
> > > It does not make sense to me.
> > > These are tracepoints for the ethdev driver.
> > > Why declaring them in the common library?
> >
> > Just to gather all mlx5 traces in the single file, to see all available tracing
> caps in single view.
> 
> Better to not export them.
> 
It is not about export, we have analyzing script over the trace data, and it would be better to see all the tracing
routines gathered in one place. I would prefer not to spread trace point definitions over the multiple source files.
 
With best regards,
Slava


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  8:19                   ` Slava Ovsiienko
@ 2023-06-27  9:33                     ` Thomas Monjalon
  2023-06-27  9:43                       ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27  9:33 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Jerin Jacob, Raslan Darawsheh, david.marchand, Maayan Kashani

27/06/2023 10:19, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 27/06/2023 08:15, Slava Ovsiienko:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 13/06/2023 18:01, Jerin Jacob:
> > > > > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko
> > > > > <viacheslavo@nvidia.com>
> > > > wrote:
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > > > > <viacheslavo@nvidia.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > <..snip..>
> > > > > > > > > >
> > > > > > > > > >         mlx5_os_interrupt_handler_create; #
> > WINDOWS_NO_EXPORT
> > > > > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > > > > WINDOWS_NO_EXPORT
> > > > > > > > > > +
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > > > > >
> > > > > > > > > No need to expose these symbols. It is getting removed
> > > > > > > > > from rest of
> > > > DPDK.
> > > > > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > It is not for application, it is for PMD itself, w/o
> > > > > > > > exposing the symbols build
> > > > > > > failed.
> > > > > > >
> > > > > > > PMD is implementing this trace endpoints, not consuming this
> > > > > > > trace
> > > > point.
> > > > > > > Right? If so, Why to expose these symbols?
> > > > > >
> > > > > > As far as understand:
> > > > > > The tracepoint routines are defined in dedicated
> > > > > > common/mlx5_trace.c
> > > > file.
> > > > > > The tx_burst in mlx5 is implemented as template in header file,
> > > > > > and this template is used in multiple .c files under net/mlx5 filder.
> > > > > > So, common/mlx5 should expose its symbols to net/mlx5 to allow
> > > > > > successful linkage.
> > > > >
> > > > > OK. I missed the fact the these are in common code and net driver
> > > > > is depened on that.
> > > > > So changes makes sense.
> > > >
> > > > It does not make sense to me.
> > > > These are tracepoints for the ethdev driver.
> > > > Why declaring them in the common library?
> > >
> > > Just to gather all mlx5 traces in the single file, to see all available tracing
> > caps in single view.
> > 
> > Better to not export them.
> > 
> It is not about export, we have analyzing script over the trace data, and it would be better to see all the tracing
> routines gathered in one place. I would prefer not to spread trace point definitions over the multiple source files.

You don't need to export trace symbols for using them.
It has been discussed already with Jerin that we prefer not exporting
trace symbols if it can be avoided. Here it can be avoided.





^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  9:33                     ` Thomas Monjalon
@ 2023-06-27  9:43                       ` Slava Ovsiienko
  2023-06-27 11:36                         ` Thomas Monjalon
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-27  9:43 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Jerin Jacob, Raslan Darawsheh, david.marchand, Maayan Kashani

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 27, 2023 12:34 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Raslan Darawsheh
> <rasland@nvidia.com>; david.marchand@redhat.com; Maayan Kashani
> <mkashani@nvidia.com>
> Subject: Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
> 
> 27/06/2023 10:19, Slava Ovsiienko:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 27/06/2023 08:15, Slava Ovsiienko:
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > 13/06/2023 18:01, Jerin Jacob:
> > > > > > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko
> > > > > > <viacheslavo@nvidia.com>
> > > > > wrote:
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > > > > > <viacheslavo@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > <..snip..>
> > > > > > > > > > >
> > > > > > > > > > >         mlx5_os_interrupt_handler_create; #
> > > WINDOWS_NO_EXPORT
> > > > > > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > > > > > WINDOWS_NO_EXPORT
> > > > > > > > > > > +
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > > > > > >
> > > > > > > > > > No need to expose these symbols. It is getting removed
> > > > > > > > > > from rest of
> > > > > DPDK.
> > > > > > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > It is not for application, it is for PMD itself, w/o
> > > > > > > > > exposing the symbols build
> > > > > > > > failed.
> > > > > > > >
> > > > > > > > PMD is implementing this trace endpoints, not consuming
> > > > > > > > this trace
> > > > > point.
> > > > > > > > Right? If so, Why to expose these symbols?
> > > > > > >
> > > > > > > As far as understand:
> > > > > > > The tracepoint routines are defined in dedicated
> > > > > > > common/mlx5_trace.c
> > > > > file.
> > > > > > > The tx_burst in mlx5 is implemented as template in header
> > > > > > > file, and this template is used in multiple .c files under net/mlx5
> filder.
> > > > > > > So, common/mlx5 should expose its symbols to net/mlx5 to
> > > > > > > allow successful linkage.
> > > > > >
> > > > > > OK. I missed the fact the these are in common code and net
> > > > > > driver is depened on that.
> > > > > > So changes makes sense.
> > > > >
> > > > > It does not make sense to me.
> > > > > These are tracepoints for the ethdev driver.
> > > > > Why declaring them in the common library?
> > > >
> > > > Just to gather all mlx5 traces in the single file, to see all
> > > > available tracing
> > > caps in single view.
> > >
> > > Better to not export them.
> > >
> > It is not about export, we have analyzing script over the trace data,
> > and it would be better to see all the tracing routines gathered in one place.
> I would prefer not to spread trace point definitions over the multiple source
> files.
> 
> You don't need to export trace symbols for using them.
> It has been discussed already with Jerin that we prefer not exporting trace
> symbols if it can be avoided. Here it can be avoided.

Without exporting symbols defined in common/mlx5 we have link error for net/mlx5.
So, do you insist on having dedicated mlx5_trace.c per /net/mlx5, common/mlx5, etc?

With best regards,
Slava


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-27  0:46     ` Thomas Monjalon
@ 2023-06-27 11:24       ` Slava Ovsiienko
  2023-06-27 11:34         ` Thomas Monjalon
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-27 11:24 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Raslan Darawsheh, rjarry, jerinj



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, June 27, 2023 3:46 AM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> rjarry@redhat.com; jerinj@marvell.com
> Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> 20/06/2023 14:00, Raslan Darawsheh:
> > Hi,
> >
> > > -----Original Message-----
> > > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > > Sent: Tuesday, June 13, 2023 7:59 PM
> > > To: dev@dpdk.org
> > > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > >
> > > The mlx5 provides the send scheduling on specific moment of time,
> > > and for the related kind of applications it would be extremely
> > > useful to have extra debug information - when and how packets were
> > > scheduled and when the actual sending was completed by the NIC
> > > hardware (it helps application to track the internal delay issues).
> > >
> > > Because the DPDK tx datapath API does not suppose getting any
> > > feedback from the driver and the feature looks like to be mlx5
> > > specific, it seems to be reasonable to engage exisiting DPDK datapath
> tracing capability.
> > >
> > > The work cycle is supposed to be:
> > >   - compile appplication with enabled tracing
> > >   - run application with EAL parameters configuring the tracing in mlx5
> > >     Tx datapath
> > >   - store the dump file with gathered tracing information
> > >   - run analyzing scrypt (in Python) to combine related events (packet
> > >     firing and completion) and see the data in human-readable view
> > >
> > > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > > all the debug data including the full timings information.
> > >
> > >
> > > 1. Build DPDK application with enabled datapath tracing
> > >
> > > The meson option should be specified:
> > >    --enable_trace_fp=true
> > >
> > > The c_args shoudl be specified:
> > >    -DALLOW_EXPERIMENTAL_API
> > >
> > > The DPDK configuration examples:
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > > DALLOW_EXPERIMENTAL_API' build
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > >
> > >
> > > 2. Configuring the NIC
> > >
> > > If the sending completion timings are important the NIC should be
> > > configured to provide realtime timestamps, the
> > > REAL_TIME_CLOCK_ENABLE NV settings parameter should be configured
> to
> > > TRUE, for example with command (and with following FW/driver reset):
> > >
> > >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > > REAL_TIME_CLOCK_ENABLE=1
> > >
> > >
> > > 3. Run DPDK application to gather the traces
> > >
> > > EAL parameters controlling trace capability in runtime
> > >
> > >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> > >                             with matching names at least "pmd.net.mlx5.tx"
> > >                             must be enabled to gather all events needed
> > >                             to analyze mlx5 Tx datapath and its timings.
> > >                             By default all tracepoints are disabled.
> > >
> > >   --trace-dir=/var/log - trace storing directory
> > >
> > >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> > >                                        per thread. The default is 1MB.
> > >
> > >   --trace-mode=overwrite|discard  - optional, selects trace data buffer
> mode.
> > >
> > >
> > > 4. Installing or Building Babeltrace2 Package
> > >
> > > The gathered trace data can be analyzed with a developed Python script.
> > > To parse the trace, the data script uses the Babeltrace2 library.
> > > The package should be either installed or built from source code as
> > > shown below:
> > >
> > >   git clone https://github.com/efficios/babeltrace.git
> > >   cd babeltrace
> > >   ./bootstrap
> > >   ./configure -help
> > >   ./configure --disable-api-doc --disable-man-pages
> > >               --disable-python-bindings-doc --enbale-python-plugins
> > >               --enable-python-binding
> > >
> > > 5. Running the Analyzing Script
> > >
> > > The analyzing script is located in the folder:
> > > ./drivers/net/mlx5/tools It requires Python3.6, Babeltrace2 packages
> > > and it takes the only parameter of trace data file. For example:
> > >
> > >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > >
> > >
> > > 6. Interpreting the Script Output Data
> > >
> > > All the timings are given in nanoseconds.
> > > The list of Tx (and coming Rx) bursts per port/queue is presented in
> > > the output.
> > > Each list element contains the list of built WQEs with specific
> > > opcodes, and each WQE contains the list of the encompassed packets to
> send.
> 
> This information should be in the documentation.
OK, should we make this cover-letter part of mlx5.rst?

> 
> I think we should request a review of the Python script from people familiar
> with tracing and from people more familiar with Python scripting for user
> tools.
Would be very helpful, could you recommend/ask someone?

With best regards,
Slava



> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-27 11:24       ` Slava Ovsiienko
@ 2023-06-27 11:34         ` Thomas Monjalon
  2023-06-28 14:18           ` Robin Jarry
  0 siblings, 1 reply; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27 11:34 UTC (permalink / raw)
  To: Slava Ovsiienko, rjarry; +Cc: dev, Raslan Darawsheh, jerinj, david.marchand

27/06/2023 13:24, Slava Ovsiienko:
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, June 27, 2023 3:46 AM
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> > rjarry@redhat.com; jerinj@marvell.com
> > Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > 
> > 20/06/2023 14:00, Raslan Darawsheh:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > > > Sent: Tuesday, June 13, 2023 7:59 PM
> > > > To: dev@dpdk.org
> > > > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > > >
> > > > The mlx5 provides the send scheduling on specific moment of time,
> > > > and for the related kind of applications it would be extremely
> > > > useful to have extra debug information - when and how packets were
> > > > scheduled and when the actual sending was completed by the NIC
> > > > hardware (it helps application to track the internal delay issues).
> > > >
> > > > Because the DPDK tx datapath API does not suppose getting any
> > > > feedback from the driver and the feature looks like to be mlx5
> > > > specific, it seems to be reasonable to engage exisiting DPDK datapath
> > tracing capability.
> > > >
> > > > The work cycle is supposed to be:
> > > >   - compile appplication with enabled tracing
> > > >   - run application with EAL parameters configuring the tracing in mlx5
> > > >     Tx datapath
> > > >   - store the dump file with gathered tracing information
> > > >   - run analyzing scrypt (in Python) to combine related events (packet
> > > >     firing and completion) and see the data in human-readable view
> > > >
> > > > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > > > all the debug data including the full timings information.
> > > >
> > > >
> > > > 1. Build DPDK application with enabled datapath tracing
> > > >
> > > > The meson option should be specified:
> > > >    --enable_trace_fp=true
> > > >
> > > > The c_args shoudl be specified:
> > > >    -DALLOW_EXPERIMENTAL_API
> > > >
> > > > The DPDK configuration examples:
> > > >
> > > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > > > DALLOW_EXPERIMENTAL_API' build
> > > >
> > > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > > build
> > > >
> > > >   meson configure --buildtype=release -Denable_trace_fp=true
> > > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > > build
> > > >
> > > >   meson configure --buildtype=release -Denable_trace_fp=true
> > > >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > > >
> > > >
> > > > 2. Configuring the NIC
> > > >
> > > > If the sending completion timings are important the NIC should be
> > > > configured to provide realtime timestamps, the
> > > > REAL_TIME_CLOCK_ENABLE NV settings parameter should be configured
> > to
> > > > TRUE, for example with command (and with following FW/driver reset):
> > > >
> > > >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > > > REAL_TIME_CLOCK_ENABLE=1
> > > >
> > > >
> > > > 3. Run DPDK application to gather the traces
> > > >
> > > > EAL parameters controlling trace capability in runtime
> > > >
> > > >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> > > >                             with matching names at least "pmd.net.mlx5.tx"
> > > >                             must be enabled to gather all events needed
> > > >                             to analyze mlx5 Tx datapath and its timings.
> > > >                             By default all tracepoints are disabled.
> > > >
> > > >   --trace-dir=/var/log - trace storing directory
> > > >
> > > >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> > > >                                        per thread. The default is 1MB.
> > > >
> > > >   --trace-mode=overwrite|discard  - optional, selects trace data buffer
> > mode.
> > > >
> > > >
> > > > 4. Installing or Building Babeltrace2 Package
> > > >
> > > > The gathered trace data can be analyzed with a developed Python script.
> > > > To parse the trace, the data script uses the Babeltrace2 library.
> > > > The package should be either installed or built from source code as
> > > > shown below:
> > > >
> > > >   git clone https://github.com/efficios/babeltrace.git
> > > >   cd babeltrace
> > > >   ./bootstrap
> > > >   ./configure -help
> > > >   ./configure --disable-api-doc --disable-man-pages
> > > >               --disable-python-bindings-doc --enbale-python-plugins
> > > >               --enable-python-binding
> > > >
> > > > 5. Running the Analyzing Script
> > > >
> > > > The analyzing script is located in the folder:
> > > > ./drivers/net/mlx5/tools It requires Python3.6, Babeltrace2 packages
> > > > and it takes the only parameter of trace data file. For example:
> > > >
> > > >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > > >
> > > >
> > > > 6. Interpreting the Script Output Data
> > > >
> > > > All the timings are given in nanoseconds.
> > > > The list of Tx (and coming Rx) bursts per port/queue is presented in
> > > > the output.
> > > > Each list element contains the list of built WQEs with specific
> > > > opcodes, and each WQE contains the list of the encompassed packets to
> > send.
> > 
> > This information should be in the documentation.
> OK, should we make this cover-letter part of mlx5.rst?

Kind of, yes.

> > I think we should request a review of the Python script from people familiar
> > with tracing and from people more familiar with Python scripting for user
> > tools.
> Would be very helpful, could you recommend/ask someone?

Jerin, what do you think of such a script?
Robin, would you have time to look at this trace processing script please?



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4] app/testpmd: add trace dump command
  2023-06-26 11:57 ` [PATCH v4] " Viacheslav Ovsiienko
@ 2023-06-27 11:34   ` Ferruh Yigit
  2023-06-27 11:39     ` Slava Ovsiienko
  2023-06-27 14:44     ` [PATCH] app/testpmd: add dump command help message Viacheslav Ovsiienko
  0 siblings, 2 replies; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-27 11:34 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev; +Cc: jerinj

On 6/26/2023 12:57 PM, Viacheslav Ovsiienko wrote:
> The "dump_trace" CLI command is added to trigger
> saving the trace dumps to the trace directory.
> 
> The tracing data are saved according to the EAL configuration
> (explicit --trace-dir EAL command line parameter alters
> the target folder to save). The result dump folder gets the name
> like rte-YYYY-MM-DD-xx-HH-MM-SS format.
> 
> This command is useful to get the trace date without exiting
> testpmd application and to get the multiple dumps to observe
> the situation in dynamics.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 

Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>

> --
> 
> v1: https://inbox.dpdk.org/dev/20230609152847.32496-2-viacheslavo@nvidia.com
> v2: https://inbox.dpdk.org/dev/20230613165845.19109-2-viacheslavo@nvidia.com
>     - changed to save_trace command
>     - Windows compilation check added
> 
> v3: https://inbox.dpdk.org/dev/20230626110734.14126-1-viacheslavo@nvidia.com
>     - reverted to "dump_trace" command
> 
> v4: - added missed header file include
>     - missed #ifdef added for Windows compilation (no trace support
>       for Windows)
> ---
>  app/test-pmd/cmdline.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 

Can you please update 'doc/guides/testpmd_app_ug/testpmd_funcs.rst' for
new command?

It looks like dump_* commands missed in the help output,
'cmd_help_long_parsed()', can you please append this new one end of
"display" section, we can complete the missing ones later?

<...>

> @@ -8371,10 +8372,17 @@ static void cmd_dump_parsed(void *parsed_result,
>  		rte_lcore_dump(stdout);
>  	else if (!strcmp(res->dump, "dump_log_types"))
>  		rte_log_dump(stdout);
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +	else if (!strcmp(res->dump, "dump_trace"))
> +		rte_trace_save();
> +#endif		
>  }
>  
>  static cmdline_parse_token_string_t cmd_dump_dump =
>  	TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
> +#ifndef RTE_EXEC_ENV_WINDOWS
> +		"dump_trace#"
> +#endif
>

Why not add "dump_trace#" as last item, to keep same order with
'cmd_dump_parsed()'?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-27  9:43                       ` Slava Ovsiienko
@ 2023-06-27 11:36                         ` Thomas Monjalon
  0 siblings, 0 replies; 76+ messages in thread
From: Thomas Monjalon @ 2023-06-27 11:36 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Jerin Jacob, Raslan Darawsheh, david.marchand, Maayan Kashani

27/06/2023 11:43, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 27/06/2023 10:19, Slava Ovsiienko:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 27/06/2023 08:15, Slava Ovsiienko:
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > 13/06/2023 18:01, Jerin Jacob:
> > > > > > > On Tue, Jun 13, 2023 at 9:29 PM Slava Ovsiienko
> > > > > > > <viacheslavo@nvidia.com>
> > > > > > wrote:
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > > On Tue, Jun 13, 2023 at 9:20 PM Slava Ovsiienko
> > > > > > > > > <viacheslavo@nvidia.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > <..snip..>
> > > > > > > > > > > >
> > > > > > > > > > > >         mlx5_os_interrupt_handler_create; #
> > > > WINDOWS_NO_EXPORT
> > > > > > > > > > > >         mlx5_os_interrupt_handler_destroy; #
> > > > > > > > > > > > WINDOWS_NO_EXPORT
> > > > > > > > > > > > +
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_entry;
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_exit;
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wqe;
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_wait;
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_push;
> > > > > > > > > > > > +       __rte_pmd_mlx5_trace_tx_complete;
> > > > > > > > > > >
> > > > > > > > > > > No need to expose these symbols. It is getting removed
> > > > > > > > > > > from rest of
> > > > > > DPDK.
> > > > > > > > > > > Application can do rte_trace_lookup() to get this address.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > It is not for application, it is for PMD itself, w/o
> > > > > > > > > > exposing the symbols build
> > > > > > > > > failed.
> > > > > > > > >
> > > > > > > > > PMD is implementing this trace endpoints, not consuming
> > > > > > > > > this trace
> > > > > > point.
> > > > > > > > > Right? If so, Why to expose these symbols?
> > > > > > > >
> > > > > > > > As far as understand:
> > > > > > > > The tracepoint routines are defined in dedicated
> > > > > > > > common/mlx5_trace.c
> > > > > > file.
> > > > > > > > The tx_burst in mlx5 is implemented as template in header
> > > > > > > > file, and this template is used in multiple .c files under net/mlx5
> > filder.
> > > > > > > > So, common/mlx5 should expose its symbols to net/mlx5 to
> > > > > > > > allow successful linkage.
> > > > > > >
> > > > > > > OK. I missed the fact the these are in common code and net
> > > > > > > driver is depened on that.
> > > > > > > So changes makes sense.
> > > > > >
> > > > > > It does not make sense to me.
> > > > > > These are tracepoints for the ethdev driver.
> > > > > > Why declaring them in the common library?
> > > > >
> > > > > Just to gather all mlx5 traces in the single file, to see all
> > > > > available tracing
> > > > caps in single view.
> > > >
> > > > Better to not export them.
> > > >
> > > It is not about export, we have analyzing script over the trace data,
> > > and it would be better to see all the tracing routines gathered in one place.
> > I would prefer not to spread trace point definitions over the multiple source
> > files.
> > 
> > You don't need to export trace symbols for using them.
> > It has been discussed already with Jerin that we prefer not exporting trace
> > symbols if it can be avoided. Here it can be avoided.
> 
> Without exporting symbols defined in common/mlx5 we have link error for net/mlx5.
> So, do you insist on having dedicated mlx5_trace.c per /net/mlx5, common/mlx5, etc?

Yes please it looks better to have tracepoints close to their respective functions.




^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v4] app/testpmd: add trace dump command
  2023-06-27 11:34   ` Ferruh Yigit
@ 2023-06-27 11:39     ` Slava Ovsiienko
  2023-06-27 11:58       ` Ferruh Yigit
  2023-06-27 14:44     ` [PATCH] app/testpmd: add dump command help message Viacheslav Ovsiienko
  1 sibling, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-27 11:39 UTC (permalink / raw)
  To: Ferruh Yigit, dev; +Cc: jerinj

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, June 27, 2023 2:35 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: jerinj@marvell.com
> Subject: Re: [PATCH v4] app/testpmd: add trace dump command
> 
> On 6/26/2023 12:57 PM, Viacheslav Ovsiienko wrote:
> > The "dump_trace" CLI command is added to trigger saving the trace
> > dumps to the trace directory.
> >
> > The tracing data are saved according to the EAL configuration
> > (explicit --trace-dir EAL command line parameter alters the target
> > folder to save). The result dump folder gets the name like
> > rte-YYYY-MM-DD-xx-HH-MM-SS format.
> >
> > This command is useful to get the trace date without exiting testpmd
> > application and to get the multiple dumps to observe the situation in
> > dynamics.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> >
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
> 
> > --
> >
> > v1:
> > https://inbox.dpdk.org/dev/20230609152847.32496-2-viacheslavo@nvidia.c
> > om
> > v2: https://inbox.dpdk.org/dev/20230613165845.19109-2-
> viacheslavo@nvidia.com
> >     - changed to save_trace command
> >     - Windows compilation check added
> >
> > v3: https://inbox.dpdk.org/dev/20230626110734.14126-1-
> viacheslavo@nvidia.com
> >     - reverted to "dump_trace" command
> >
> > v4: - added missed header file include
> >     - missed #ifdef added for Windows compilation (no trace support
> >       for Windows)
> > ---
> >  app/test-pmd/cmdline.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> 
> Can you please update 'doc/guides/testpmd_app_ug/testpmd_funcs.rst' for
> new command?
Sure.

> 
> It looks like dump_* commands missed in the help output,
> 'cmd_help_long_parsed()', can you please append this new one end of
> "display" section, we can complete the missing ones later?
> 
> <...>
> 
> > @@ -8371,10 +8372,17 @@ static void cmd_dump_parsed(void
> *parsed_result,
> >  		rte_lcore_dump(stdout);
> >  	else if (!strcmp(res->dump, "dump_log_types"))
> >  		rte_log_dump(stdout);
> > +#ifndef RTE_EXEC_ENV_WINDOWS
> > +	else if (!strcmp(res->dump, "dump_trace"))
> > +		rte_trace_save();
> > +#endif
> >  }
> >
> >  static cmdline_parse_token_string_t cmd_dump_dump =
> >  	TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
> > +#ifndef RTE_EXEC_ENV_WINDOWS
> > +		"dump_trace#"
> > +#endif
> >
> 
> Why not add "dump_trace#" as last item, to keep same order with
> 'cmd_dump_parsed()'?

This would require modify the preceding command undef #ifndef and #else:
#ifndef RTE_EXEC_ENV_WINDOWS
"dump_log_types#
"dump_trace");
#else
"dump_log_types");
#endif

If you think order is more important - please,  let me know, I'll update

With best regards,
Slava



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4] app/testpmd: add trace dump command
  2023-06-27 11:39     ` Slava Ovsiienko
@ 2023-06-27 11:58       ` Ferruh Yigit
  0 siblings, 0 replies; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-27 11:58 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: jerinj

On 6/27/2023 12:39 PM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Tuesday, June 27, 2023 2:35 PM
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>> Cc: jerinj@marvell.com
>> Subject: Re: [PATCH v4] app/testpmd: add trace dump command
>>
>> On 6/26/2023 12:57 PM, Viacheslav Ovsiienko wrote:
>>> The "dump_trace" CLI command is added to trigger saving the trace
>>> dumps to the trace directory.
>>>
>>> The tracing data are saved according to the EAL configuration
>>> (explicit --trace-dir EAL command line parameter alters the target
>>> folder to save). The result dump folder gets the name like
>>> rte-YYYY-MM-DD-xx-HH-MM-SS format.
>>>
>>> This command is useful to get the trace date without exiting testpmd
>>> application and to get the multiple dumps to observe the situation in
>>> dynamics.
>>>
>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>>>
>>
>> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
>>
>>> --
>>>
>>> v1:
>>> https://inbox.dpdk.org/dev/20230609152847.32496-2-viacheslavo@nvidia.c
>>> om
>>> v2: https://inbox.dpdk.org/dev/20230613165845.19109-2-
>> viacheslavo@nvidia.com
>>>     - changed to save_trace command
>>>     - Windows compilation check added
>>>
>>> v3: https://inbox.dpdk.org/dev/20230626110734.14126-1-
>> viacheslavo@nvidia.com
>>>     - reverted to "dump_trace" command
>>>
>>> v4: - added missed header file include
>>>     - missed #ifdef added for Windows compilation (no trace support
>>>       for Windows)
>>> ---
>>>  app/test-pmd/cmdline.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>
>> Can you please update 'doc/guides/testpmd_app_ug/testpmd_funcs.rst' for
>> new command?
> Sure.
> 
>>
>> It looks like dump_* commands missed in the help output,
>> 'cmd_help_long_parsed()', can you please append this new one end of
>> "display" section, we can complete the missing ones later?
>>
>> <...>
>>
>>> @@ -8371,10 +8372,17 @@ static void cmd_dump_parsed(void
>> *parsed_result,
>>>  		rte_lcore_dump(stdout);
>>>  	else if (!strcmp(res->dump, "dump_log_types"))
>>>  		rte_log_dump(stdout);
>>> +#ifndef RTE_EXEC_ENV_WINDOWS
>>> +	else if (!strcmp(res->dump, "dump_trace"))
>>> +		rte_trace_save();
>>> +#endif
>>>  }
>>>
>>>  static cmdline_parse_token_string_t cmd_dump_dump =
>>>  	TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
>>> +#ifndef RTE_EXEC_ENV_WINDOWS
>>> +		"dump_trace#"
>>> +#endif
>>>
>>
>> Why not add "dump_trace#" as last item, to keep same order with
>> 'cmd_dump_parsed()'?
> 
> This would require modify the preceding command undef #ifndef and #else:
> #ifndef RTE_EXEC_ENV_WINDOWS
> "dump_log_types#
> "dump_trace");
> #else
> "dump_log_types");
> #endif
> 
> If you think order is more important - please,  let me know, I'll update
> 

Lets move it just before 'dump_log_types', in both instance :)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5] app/testpmd: add trace dump command
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (9 preceding siblings ...)
  2023-06-26 11:57 ` [PATCH v4] " Viacheslav Ovsiienko
@ 2023-06-27 13:09 ` Viacheslav Ovsiienko
  2023-06-27 15:18   ` Ferruh Yigit
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-27 13:09 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, jerinj

The "dump_trace" CLI command is added to trigger
saving the trace dumps to the trace directory.

The tracing data are saved according to the EAL configuration
(explicit --trace-dir EAL command line parameter alters
the target folder to save). The result dump folder gets the name
like rte-YYYY-MM-DD-xx-HH-MM-SS format.

This command is useful to get the trace date without exiting
testpmd application and to get the multiple dumps to observe
the situation in dynamics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--

v1: https://inbox.dpdk.org/dev/20230609152847.32496-2-viacheslavo@nvidia.com
v2: https://inbox.dpdk.org/dev/20230613165845.19109-2-viacheslavo@nvidia.com
    - changed to save_trace command
    - Windows compilation check added

v3: https://inbox.dpdk.org/dev/20230626110734.14126-1-viacheslavo@nvidia.com
    - reverted to "dump_trace" command

v4: http://patches.dpdk.org/project/dpdk/patch/20230626115749.8961-1-viacheslavo@nvidia.com/
    - added missed header file include
    - missed #ifdef added for Windows compilation (no trace support
      for Windows)

v5: - dump_trace command documented
    - dump commands order neating
    - checkpatch issue (white space) fixed
---
 app/test-pmd/cmdline.c                      | 8 ++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 5da38b0bb4..18e6e19497 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -39,6 +39,7 @@
 #include <rte_gro.h>
 #endif
 #include <rte_mbuf_dyn.h>
+#include <rte_trace.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -8369,6 +8370,10 @@ static void cmd_dump_parsed(void *parsed_result,
 		rte_devargs_dump(stdout);
 	else if (!strcmp(res->dump, "dump_lcores"))
 		rte_lcore_dump(stdout);
+#ifndef RTE_EXEC_ENV_WINDOWS
+	else if (!strcmp(res->dump, "dump_trace"))
+		rte_trace_save();
+#endif
 	else if (!strcmp(res->dump, "dump_log_types"))
 		rte_log_dump(stdout);
 }
@@ -8383,6 +8388,9 @@ static cmdline_parse_token_string_t cmd_dump_dump =
 		"dump_mempool#"
 		"dump_devargs#"
 		"dump_lcores#"
+#ifndef RTE_EXEC_ENV_WINDOWS
+		"dump_trace#"
+#endif
 		"dump_log_types");
 
 static cmdline_parse_inst_t cmd_dump = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index b755c38c98..8660883ae3 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -598,6 +598,13 @@ Dumps the logical cores list::
 
    testpmd> dump_lcores
 
+dump trace
+~~~~~~~~~~
+
+Dumps the tracing data to the folder according to the current EAL settings::
+
+   testpmd> dump_trace
+
 dump log types
 ~~~~~~~~~~~~~~
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH] app/testpmd: add dump command help message
  2023-06-27 11:34   ` Ferruh Yigit
  2023-06-27 11:39     ` Slava Ovsiienko
@ 2023-06-27 14:44     ` Viacheslav Ovsiienko
  2023-06-27 18:03       ` Ferruh Yigit
  1 sibling, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-27 14:44 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit

There was missing "dump_xxxxx" commands help message.
Patch adds support for "help dump" command to see one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 43 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 18e6e19497..9edbb7d04f 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -92,6 +92,7 @@ static void cmd_help_brief_parsed(__rte_unused void *parsed_result,
 		"    help ports                      : Configuring ports.\n"
 		"    help filters                    : Filters configuration help.\n"
 		"    help traffic_management         : Traffic Management commands.\n"
+		"    help dump                       : Dumps related commands.\n"
 		"    help devices                    : Device related commands.\n"
 		"    help drivers                    : Driver specific commands.\n"
 		"    help all                        : All of the above sections.\n\n"
@@ -982,6 +983,44 @@ static void cmd_help_long_parsed(void *parsed_result,
 		);
 	}
 
+	if (show_all || !strcmp(res->section, "dump")) {
+		cmdline_printf(
+			cl,
+			"\n"
+			"Dump Commands:\n"
+			"--------------\n"
+			"dump_physmem\n"
+			"    Dumps all physical memory segment layouts\n\n"
+
+			"dump_socket_mem\n"
+			"    Dumps the memory usage of all sockets\n\n"
+
+			"dump_memzone\n"
+			"    Dumps the layout of all memory zones\n\n"
+
+			"dump_struct_sizes\n"
+			"    Dumps the size of all memory structures\n\n"
+
+			"dump_ring\n"
+			"    Dumps the status of all or specific element in DPDK rings\n\n"
+
+			"dump_mempool\n"
+			"    Dumps the statistics of all or specific memory pool\n\n"
+
+			"dump_devargs\n"
+			"    Dumps the user device list\n\n"
+
+			"dump_lcores\n"
+			"    Dumps the logical cores list\n\n"
+
+			"dump_trace\n"
+			"    Dumps the tracing data to the folder according to the current EAL settings\n\n"
+
+			"dump_log_types\n"
+			"    Dumps the log level for all the dpdk modules\n\n"
+		);
+	}
+
 	if (show_all || !strcmp(res->section, "devices")) {
 		cmdline_printf(
 			cl,
@@ -1016,13 +1055,13 @@ static cmdline_parse_token_string_t cmd_help_long_help =
 static cmdline_parse_token_string_t cmd_help_long_section =
 	TOKEN_STRING_INITIALIZER(struct cmd_help_long_result, section,
 		"all#control#display#config#ports#"
-		"filters#traffic_management#devices#drivers");
+		"filters#traffic_management#dump#devices#drivers");
 
 static cmdline_parse_inst_t cmd_help_long = {
 	.f = cmd_help_long_parsed,
 	.data = NULL,
 	.help_str = "help all|control|display|config|ports|"
-		"filters|traffic_management|devices|drivers: "
+		"filters|traffic_management|dump|devices|drivers: "
 		"Show help",
 	.tokens = {
 		(void *)&cmd_help_long_help,
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v5] app/testpmd: add trace dump command
  2023-06-27 13:09 ` [PATCH v5] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-06-27 15:18   ` Ferruh Yigit
  0 siblings, 0 replies; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-27 15:18 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev; +Cc: jerinj

On 6/27/2023 2:09 PM, Viacheslav Ovsiienko wrote:
> The "dump_trace" CLI command is added to trigger
> saving the trace dumps to the trace directory.
> 
> The tracing data are saved according to the EAL configuration
> (explicit --trace-dir EAL command line parameter alters
> the target folder to save). The result dump folder gets the name
> like rte-YYYY-MM-DD-xx-HH-MM-SS format.
> 
> This command is useful to get the trace date without exiting
> testpmd application and to get the multiple dumps to observe
> the situation in dynamics.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 

Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>


Help message for dump_trace command are still missing, but all can be
added with a single commit, so OK to continue with this.

Applied to dpdk-next-net/main, thanks.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH] app/testpmd: add dump command help message
  2023-06-27 14:44     ` [PATCH] app/testpmd: add dump command help message Viacheslav Ovsiienko
@ 2023-06-27 18:03       ` Ferruh Yigit
  2023-06-28  9:54         ` [PATCH v2] " Viacheslav Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-27 18:03 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: Aman Singh, dev

On 6/27/2023 3:44 PM, Viacheslav Ovsiienko wrote:
> There was missing "dump_xxxxx" commands help message.
> Patch adds support for "help dump" command to see one.
> 

Hi Slava,

Thanks for the patch, this seems missed for a while.

> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  app/test-pmd/cmdline.c | 43 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 18e6e19497..9edbb7d04f 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -92,6 +92,7 @@ static void cmd_help_brief_parsed(__rte_unused void *parsed_result,
>  		"    help ports                      : Configuring ports.\n"
>  		"    help filters                    : Filters configuration help.\n"
>  		"    help traffic_management         : Traffic Management commands.\n"
> +		"    help dump                       : Dumps related commands.\n"

I am not sure 'dump_*' commands are unique group to make a new help
section, even description is vague "dump related ..", or if they are
important enough for a new section,

what would you think to append them to 'display' section?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2] app/testpmd: add dump command help message
  2023-06-27 18:03       ` Ferruh Yigit
@ 2023-06-28  9:54         ` Viacheslav Ovsiienko
  2023-06-28 13:18           ` Ferruh Yigit
  0 siblings, 1 reply; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28  9:54 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit

There was missing "dump_xxxxx" commands help message.
Patch updates "help display" section of the help message.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 68 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 18e6e19497..9853fd3069 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -255,6 +255,36 @@ static void cmd_help_long_parsed(void *parsed_result,
 
 			"show port (port_id) flow_ctrl"
 			"	Show flow control info of a port.\n\n"
+
+			"dump_physmem\n"
+			"    Dumps all physical memory segment layouts\n\n"
+
+			"dump_socket_mem\n"
+			"    Dumps the memory usage of all sockets\n\n"
+
+			"dump_memzone\n"
+			"    Dumps the layout of all memory zones\n\n"
+
+			"dump_struct_sizes\n"
+			"    Dumps the size of all memory structures\n\n"
+
+			"dump_ring\n"
+			"    Dumps the status of all or specific element in DPDK rings\n\n"
+
+			"dump_mempool\n"
+			"    Dumps the statistics of all or specific memory pool\n\n"
+
+			"dump_devargs\n"
+			"    Dumps the user device list\n\n"
+
+			"dump_lcores\n"
+			"    Dumps the logical cores list\n\n"
+
+			"dump_trace\n"
+			"    Dumps the tracing data to the folder according to the current EAL settings\n\n"
+
+			"dump_log_types\n"
+			"    Dumps the log level for all the dpdk modules\n\n"
 		);
 	}
 
@@ -982,6 +1012,44 @@ static void cmd_help_long_parsed(void *parsed_result,
 		);
 	}
 
+	if (show_all || !strcmp(res->section, "dump")) {
+		cmdline_printf(
+			cl,
+			"\n"
+			"Dump Commands:\n"
+			"--------------\n"
+			"dump_physmem\n"
+			"    Dumps all physical memory segment layouts\n\n"
+
+			"dump_socket_mem\n"
+			"    Dumps the memory usage of all sockets\n\n"
+
+			"dump_memzone\n"
+			"    Dumps the layout of all memory zones\n\n"
+
+			"dump_struct_sizes\n"
+			"    Dumps the size of all memory structures\n\n"
+
+			"dump_ring\n"
+			"    Dumps the status of all or specific element in DPDK rings\n\n"
+
+			"dump_mempool\n"
+			"    Dumps the statistics of all or specific memory pool\n\n"
+
+			"dump_devargs\n"
+			"    Dumps the user device list\n\n"
+
+			"dump_lcores\n"
+			"    Dumps the logical cores list\n\n"
+
+			"dump_trace\n"
+			"    Dumps the tracing data to the folder according to the current EAL settings\n\n"
+
+			"dump_log_types\n"
+			"    Dumps the log level for all the dpdk modules\n\n"
+		);
+	}
+
 	if (show_all || !strcmp(res->section, "devices")) {
 		cmdline_printf(
 			cl,
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (10 preceding siblings ...)
  2023-06-27 13:09 ` [PATCH v5] app/testpmd: add trace dump command Viacheslav Ovsiienko
@ 2023-06-28 11:09 ` Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                     ` (3 more replies)
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  14 siblings, 4 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28 11:09 UTC (permalink / raw)
  To: dev

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

v3: - tracepoint routines are moved to the net folder, no need to export
    - documentation added
    - testpmd patches moved out from series to the dedicated patches

Viacheslav Ovsiienko (4):
  net/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script
  doc: add mlx5 datapath tracing feature description

 doc/guides/nics/mlx5.rst             |  77 ++++++++
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |  29 +++
 drivers/net/mlx5/mlx5_tx.h           | 135 ++++++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 8 files changed, 537 insertions(+), 29 deletions(-)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 1/4] net/mlx5: introduce tracepoints for mlx5 drivers
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-06-28 11:09   ` Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28 11:09 UTC (permalink / raw)
  To: dev

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.h   | 19 ----------
 drivers/net/mlx5/mlx5_rxtx.h | 19 ++++++++++
 drivers/net/mlx5/mlx5_tx.c   | 29 +++++++++++++++
 drivers/net/mlx5/mlx5_tx.h   | 72 +++++++++++++++++++++++++++++++++++-
 4 files changed, 118 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3514edd84e..f42607dce4 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -377,25 +377,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..13e2d90e03 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <stdlib.h>
 
+#include <rte_trace_point_register.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
@@ -232,6 +233,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
@@ -752,3 +762,22 @@ mlx5_tx_burst_mode_get(struct rte_eth_dev *dev,
 	}
 	return -EINVAL;
 }
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..b90cdf1fcc 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -13,12 +13,61 @@
 #include <rte_mempool.h>
 #include <rte_common.h>
 #include <rte_spinlock.h>
+#include <rte_trace_point.h>
 
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_rxtx.h"
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +813,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1744,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1759,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1864,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1947,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2171,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2375,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2745,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2984,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2994,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +3040,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3255,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3318,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3360,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3402,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3772,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 2/4] net/mlx5: add comprehensive send completion trace
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-06-28 11:09   ` Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28 11:09 UTC (permalink / raw)
  To: dev

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 7233c2c7fa..b54f3ccd9a 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index b90cdf1fcc..47ee8bca4f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -775,6 +775,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -801,7 +849,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -810,8 +858,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3709,7 +3761,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 3/4] net/mlx5: add Tx datapath trace analyzing script
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-06-28 11:09   ` Viacheslav Ovsiienko
  2023-06-28 11:09   ` [PATCH v3 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28 11:09 UTC (permalink / raw)
  To: dev

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 271 +++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..c8fa63a7b9
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+'''
+Analyzing the mlx5 PMD datapath tracings
+'''
+import sys
+import argparse
+import pathlib
+import bt2
+
+PFX_TX     = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+tx_blst = {}                    # current Tx bursts per CPU
+tx_qlst = {}                    # active Tx queues per port/queue
+tx_wlst = {}                    # wait timestamp list per CPU
+
+class mlx5_queue(object):
+    def __init__(self):
+        self.done_burst = []    # completed bursts
+        self.wait_burst = []    # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        for txb in self.done_burst:
+            txb.log()
+
+
+class mlx5_mbuf(object):
+    def __init__(self):
+        self.wqe = 0            # wqe id
+        self.ptr = None         # first packet mbuf pointer
+        self.len = 0            # packet data length
+        self.nseg = 0           # number of segments
+
+    def log(self):
+        out = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out += " (%d segs)" % self.nseg
+        print(out)
+
+
+class mlx5_wqe(object):
+    def __init__(self):
+        self.mbuf = []          # list of mbufs in WQE
+        self.wait_ts = 0        # preceding wait/push timestamp
+        self.comp_ts = 0        # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        id = (self.opcode >> 8) & 0xFFFF
+        op = self.opcode & 0xFF
+        fl = self.opcode >> 24
+        out = "  %04X: " % id
+        if op == 0xF:
+            out += "WAIT"
+        elif op == 0x29:
+            out += "EMPW"
+        elif op == 0xE:
+            out += "TSO "
+        elif op == 0xA:
+            out += "SEND"
+        else:
+            out += "0x%02X" % op
+        if self.comp_ts != 0:
+            out += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out += " (%d)" % self.wait_ts
+        print(out)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    # return 0 if WQE in not completed
+    def comp(self, wqe_id, ts):
+        if self.comp_ts != 0:
+            return 1
+        id = (self.opcode >> 8) & 0xFFFF
+        if id > wqe_id:
+            id -= wqe_id
+            if id <= 0x8000:
+                return 0
+        else:
+            id = wqe_id - id
+            if id >= 0x8000:
+                return 0
+        self.comp_ts = ts
+        return 1
+
+
+class mlx5_burst(object):
+    def __init__(self):
+        self.wqes = []          # issued burst WQEs
+        self.done = 0           # number of sent/recv packets
+        self.req = 0            # requested number of packets
+        self.call_ts = 0        # burst routine invocation
+        self.done_ts = 0        # burst routine done
+        self.queue = None
+
+    def log(self):
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)" %
+                  (self.call_ts, port, queue, self.done, self.req))
+        else:
+            print("%u: tx(p=%u, q=%u, %u/%u pkts in %u" %
+                  (self.call_ts, port, queue, self.done, self.req,
+                   self.done_ts - self.call_ts))
+        for wqe in self.wqes:
+            wqe.log()
+
+    # return 0 if not all of WQEs in burst completed
+    def comp(self, wqe_id, ts):
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, ts) == 0:
+                return 0
+        return 1
+
+
+def do_tx_entry(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = mlx5_burst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = mlx5_queue();
+        queue.pq_id = pq_id
+        tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = mlx5_wqe()
+    wqe.wait_ts = tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg):
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = mlx5_mbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg):
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg):
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg)
+    elif name == "exit":
+        do_tx_exit(msg)
+    elif name == "wqe":
+        do_tx_wqe(msg)
+    elif name == "wait":
+        do_tx_wait(msg)
+    elif name == "push":
+        do_tx_push(msg)
+    elif name == "complete":
+        do_tx_complete(msg)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name)
+        sys.exit(1)
+
+
+def do_log(msg_it):
+    for msg in msg_it:
+        if type(msg) is not bt2._EventMessageConst:
+            continue
+        event = msg.event
+        if event.name.startswith(PFX_TX):
+            do_tx(msg)
+        # Handling of other log event cathegories can be added here
+
+
+def do_print():
+    for pq_id in tx_qlst:
+        queue = tx_qlst.get(pq_id)
+        queue.log()
+
+
+def main(args):
+    parser = argparse.ArgumentParser()
+    parser.add_argument("path",
+                        nargs = 1,
+                        type = str,
+                        help = "input trace folder")
+    args = parser.parse_args()
+
+    msg_it = bt2.TraceCollectionMessageIterator(args.path)
+    do_log(msg_it)
+    do_print()
+    exit(0)
+
+if __name__ == "__main__":
+    main(sys.argv)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v3 4/4] doc: add mlx5 datapath tracing feature description
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2023-06-28 11:09   ` [PATCH v3 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-06-28 11:09   ` Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-06-28 11:09 UTC (permalink / raw)
  To: dev

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

The patch adds the documentation for feature usage.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 77 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0ed5cb5bc3..555f02ad2a 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2081,3 +2081,80 @@ Set the flow engine to active(0) or standby(1) mode with specific flags::
 This command works for software steering only.
 Default FDB jump should be disabled if switchdev is enabled.
 The mode will propagate to all the probed ports.
+
+Tx datapath tracing
+^^^^^^^^^^^^^^^^^^^
+
+The mlx5 provides the Tx datapath tracing capability with extra debug
+information - when and how packets were scheduled and when the actual
+sending was completed by the NIC hardware. The feature engages the
+exisiting DPDK datapath tracing capability.
+
+Usage of the mlx5 Tx datapath tracing:
+
+#. Build DPDK application with enabled datapath tracking
+
+   * The meson option should be specified: ``--enable_trace_fp=true``
+   * The c_args shoudl be specified:  ``-DALLOW_EXPERIMENTAL_API``
+
+   .. code-block:: console
+
+      meson configure --buildtype=debug -Denable_trace_fp=true
+         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+      meson configure --buildtype=release -Denable_trace_fp=true
+         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+#. Configure the NIC
+
+   If the sending completion timings are important the NIC should be configured
+   to provide realtime timestamps, the ``REAL_TIME_CLOCK_ENABLE`` NV settings
+   parameter should be configured to TRUE.
+
+   .. code-block:: console
+
+      mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1
+
+#. Run application with EAL parameters configuring the tracing in mlx5 Tx datapath
+
+    * ``--trace=pmd.net.mlx5.tx`` - the regular expression enabling the tracepoints
+      with matching names at least "pmd.net.mlx5.tx" must be enabled to gather all
+      events needed to analyze mlx5 Tx datapath and its timings. By default all
+      tracepoints are disabled.
+
+#. Store the tracing data file with gathered tracing information
+
+#. Install or build the ``Babeltrace2`` Package
+
+   The gathered trace data can be analyzed with a developed Python script.
+   To parse the trace, the data script uses the ``Babeltrace2`` library.
+   The package should be either installed or built from source code as
+   shown below.
+
+   .. code-block:: console
+
+      git clone https://github.com/efficios/babeltrace.git
+      cd babeltrace
+      ./bootstrap
+      ./configure -help
+      ./configure --disable-api-doc --disable-man-pages
+                  --disable-python-bindings-doc --enbale-python-plugins
+                  --enable-python-binding
+
+#. Run analyzing scrypt (in Python) to combine related events (packet firing and
+   completion) and see the output in human-readable view
+
+   The analyzing script is located in the folder: ``./drivers/net/mlx5/tools``
+   It requires Python3.6, ``Babeltrace2`` packages and it takes the only parameter
+   of trace data file.
+
+   .. code-block:: console
+
+      ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
+
+#. Interpreting the Script Output Data
+
+   All the timings are given in nanoseconds.
+   The list of Tx bursts per port/queue is presented in the output.
+   Each list element contains the list of built WQEs with specific opcodes, and
+   each WQE contains the list of the encompassed packets to send.
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] app/testpmd: add dump command help message
  2023-06-28  9:54         ` [PATCH v2] " Viacheslav Ovsiienko
@ 2023-06-28 13:18           ` Ferruh Yigit
  0 siblings, 0 replies; 76+ messages in thread
From: Ferruh Yigit @ 2023-06-28 13:18 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev

On 6/28/2023 10:54 AM, Viacheslav Ovsiienko wrote:
> There was missing "dump_xxxxx" commands help message.
> Patch updates "help display" section of the help message.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
>

Reviewed-by: Ferruh Yigit <ferruh.yigit@amd.com>

Applied to dpdk-next-net/main, thanks.

<...>

> @@ -982,6 +1012,44 @@ static void cmd_help_long_parsed(void *parsed_result,
>  		);
>  	}
>  
> +	if (show_all || !strcmp(res->section, "dump")) {
> +		cmdline_printf(
> +			cl,
> +			"\n"
> +			"Dump Commands:\n"
> +			"--------------\n"
> +			"dump_physmem\n"
> +			"    Dumps all physical memory segment layouts\n\n"
> +
> +			"dump_socket_mem\n"
> +			"    Dumps the memory usage of all sockets\n\n"
> +
> +			"dump_memzone\n"
> +			"    Dumps the layout of all memory zones\n\n"
> +
> +			"dump_struct_sizes\n"
> +			"    Dumps the size of all memory structures\n\n"
> +
> +			"dump_ring\n"
> +			"    Dumps the status of all or specific element in DPDK rings\n\n"
> +
> +			"dump_mempool\n"
> +			"    Dumps the statistics of all or specific memory pool\n\n"
> +
> +			"dump_devargs\n"
> +			"    Dumps the user device list\n\n"
> +
> +			"dump_lcores\n"
> +			"    Dumps the logical cores list\n\n"
> +
> +			"dump_trace\n"
> +			"    Dumps the tracing data to the folder according to the current EAL settings\n\n"
> +
> +			"dump_log_types\n"
> +			"    Dumps the log level for all the dpdk modules\n\n"
> +		);
> +	}
> +

Above part is duplicate, and removed while merging.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-27 11:34         ` Thomas Monjalon
@ 2023-06-28 14:18           ` Robin Jarry
  2023-06-29  7:16             ` Slava Ovsiienko
  0 siblings, 1 reply; 76+ messages in thread
From: Robin Jarry @ 2023-06-28 14:18 UTC (permalink / raw)
  To: Thomas Monjalon, Slava Ovsiienko
  Cc: dev, Raslan Darawsheh, jerinj, david.marchand

Thomas Monjalon, Jun 27, 2023 at 13:34:
> Robin, would you have time to look at this trace processing script
> please?

Hi there,

I've had a brief look at the script. I don't exactly know what it is
taking as input and should be producing as output. Could you give some
examples?

Maybe I could suggest a few ideas to make it "feel" more python-esque.

Cheers,


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-28 14:18           ` Robin Jarry
@ 2023-06-29  7:16             ` Slava Ovsiienko
  2023-06-29  9:08               ` Robin Jarry
  0 siblings, 1 reply; 76+ messages in thread
From: Slava Ovsiienko @ 2023-06-29  7:16 UTC (permalink / raw)
  To: Robin Jarry, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Raslan Darawsheh, jerinj, david.marchand

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

Hi, Robin

Thank you for your courtesy about script reviewing.
Please see an attachment - the raw data gathered as a result of tracing, and brief description.

With best regards,
Slava

> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Wednesday, June 28, 2023 5:19 PM
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> jerinj@marvell.com; david.marchand@redhat.com
> Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> Thomas Monjalon, Jun 27, 2023 at 13:34:
> > Robin, would you have time to look at this trace processing script
> > please?
> 
> Hi there,
> 
> I've had a brief look at the script. I don't exactly know what it is taking as
> input and should be producing as output. Could you give some examples?
> 
> Maybe I could suggest a few ideas to make it "feel" more python-esque.
> 
> Cheers,


[-- Attachment #2: package.zip --]
[-- Type: application/x-zip-compressed, Size: 617750 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
  2023-06-29  7:16             ` Slava Ovsiienko
@ 2023-06-29  9:08               ` Robin Jarry
  0 siblings, 0 replies; 76+ messages in thread
From: Robin Jarry @ 2023-06-29  9:08 UTC (permalink / raw)
  To: Slava Ovsiienko, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Raslan Darawsheh, jerinj, david.marchand

Slava Ovsiienko, Jun 29, 2023 at 09:16:
> Hi, Robin
>
> Thank you for your courtesy about script reviewing.
> Please see an attachment - the raw data gathered as a result of tracing, and brief description.

Thanks for the details. I think that most of the contents of the
included pdf file should go into the docs and/or into the script help.

As for the script itself, the first thing to do would be to fix all
warnings reported by pylint:

$ pylint --enable=all mlx5_trace.py

After that, I have a few general remarks:

* do not use global variables except for constants
* most of the time, there is no need to use sys.exit() explicitly
* print errors on stderr
* remember that python has exceptions, it makes error handling easier

I would also advise to format your code using [black][1] so that you
don't have to bother about coding style.

[1]: https://github.com/psf/black

Feel free to inspire from the general structure that is present in some
of the scripts that I have written:

* usertools/dpdk-pmdinfo.py
* usertools/dpdk-rss-flows.py (not yet applied,
  http://patches.dpdk.org/project/dpdk/patch/20230628134748.117697-3-rjarry@redhat.com/)

Cheers,
Robin


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (11 preceding siblings ...)
  2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-07-05 11:10 ` Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                     ` (3 more replies)
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
  14 siblings, 4 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 11:10 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

v3: - tracepoint routines are moved to the net folder, no need to export
    - documentation added
    - testpmd patches moved out from series to the dedicated patches

v4: - Python comments addressed
    - codestyle issues fixed

Viacheslav Ovsiienko (4):
  net/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script
  doc: add mlx5 datapath tracing feature description

 doc/guides/nics/mlx5.rst             |  78 +++++++
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_tx.c           |  29 +++
 drivers/net/mlx5/mlx5_tx.h           | 135 +++++++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 8 files changed, 574 insertions(+), 29 deletions(-)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 1/4] net/mlx5: introduce tracepoints for mlx5 drivers
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-07-05 11:10   ` Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 11:10 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.h   | 19 ----------
 drivers/net/mlx5/mlx5_rxtx.h | 19 ++++++++++
 drivers/net/mlx5/mlx5_tx.c   | 29 +++++++++++++++
 drivers/net/mlx5/mlx5_tx.h   | 72 +++++++++++++++++++++++++++++++++++-
 4 files changed, 118 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3514edd84e..f42607dce4 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -377,25 +377,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..13e2d90e03 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <stdlib.h>
 
+#include <rte_trace_point_register.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
@@ -232,6 +233,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
@@ -752,3 +762,22 @@ mlx5_tx_burst_mode_get(struct rte_eth_dev *dev,
 	}
 	return -EINVAL;
 }
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..b90cdf1fcc 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -13,12 +13,61 @@
 #include <rte_mempool.h>
 #include <rte_common.h>
 #include <rte_spinlock.h>
+#include <rte_trace_point.h>
 
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_rxtx.h"
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +813,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1744,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1759,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1864,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1947,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2171,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2375,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2745,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2984,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2994,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +3040,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3255,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3318,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3360,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3402,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3772,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 2/4] net/mlx5: add comprehensive send completion trace
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-07-05 11:10   ` Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 11:10 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 7233c2c7fa..b54f3ccd9a 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index b90cdf1fcc..47ee8bca4f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -775,6 +775,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -801,7 +849,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -810,8 +858,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3709,7 +3761,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 3/4] net/mlx5: add Tx datapath trace analyzing script
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-07-05 11:10   ` Viacheslav Ovsiienko
  2023-07-05 11:10   ` [PATCH v4 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 11:10 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..8c1fd0a350
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,307 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+"""
+Analyzing the mlx5 PMD datapath tracings
+"""
+import sys
+import argparse
+import bt2
+
+PFX_TX = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+
+class MlxQueue:
+    """Queue container object"""
+
+    def __init__(self):
+        self.done_burst = []  # completed bursts
+        self.wait_burst = []  # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        """Log all queue bursts"""
+        for txb in self.done_burst:
+            txb.log()
+
+
+class MlxMbuf:
+    """Packet mbufs container object"""
+
+    def __init__(self):
+        self.wqe = 0     # wqe id
+        self.ptr = None  # first packet mbuf pointer
+        self.len = 0     # packet data length
+        self.nseg = 0    # number of segments
+
+    def log(self):
+        """Log mbuf"""
+        out_txt = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out_txt += " (%d segs)" % self.nseg
+        print(out_txt)
+
+
+class MlxWqe:
+    """WQE container object"""
+
+    def __init__(self):
+        self.mbuf = []    # list of mbufs in WQE
+        self.wait_ts = 0  # preceding wait/push timestamp
+        self.comp_ts = 0  # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        """Log WQE"""
+        wqe_id = (self.opcode >> 8) & 0xFFFF
+        wqe_op = self.opcode & 0xFF
+        out_txt = "  %04X: " % wqe_id
+        if wqe_op == 0xF:
+            out_txt += "WAIT"
+        elif wqe_op == 0x29:
+            out_txt += "EMPW"
+        elif wqe_op == 0xE:
+            out_txt += "TSO "
+        elif wqe_op == 0xA:
+            out_txt += "SEND"
+        else:
+            out_txt += "0x%02X" % wqe_op
+        if self.comp_ts != 0:
+            out_txt += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out_txt += " (%d)" % self.wait_ts
+        print(out_txt)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if WQE in not completedLog WQE"""
+        if self.comp_ts != 0:
+            return 1
+        cur_id = (self.opcode >> 8) & 0xFFFF
+        if cur_id > wqe_id:
+            cur_id -= wqe_id
+            if cur_id <= 0x8000:
+                return 0
+        else:
+            cur_id = wqe_id - cur_id
+            if cur_id >= 0x8000:
+                return 0
+        self.comp_ts = wqe_ts
+        return 1
+
+
+class MlxBurst:
+    """Packet burst container object"""
+
+    def __init__(self):
+        self.wqes = []    # issued burst WQEs
+        self.done = 0     # number of sent/recv packets
+        self.req = 0      # requested number of packets
+        self.call_ts = 0  # burst routine invocation
+        self.done_ts = 0  # burst routine done
+        self.queue = None
+
+    def log(self):
+        """Log burst"""
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)"
+                % (self.call_ts, port, queue, self.done, self.req)
+            )
+        else:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts in %u"
+                % (
+                    self.call_ts,
+                    port,
+                    queue,
+                    self.done,
+                    self.req,
+                    self.done_ts - self.call_ts,
+                )
+            )
+        for wqe in self.wqes:
+            wqe.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if not all of WQEs in burst completed"""
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, wqe_ts) == 0:
+                return 0
+        return 1
+
+
+class MlxTrace:
+    """Trace representing object"""
+
+    def __init__(self):
+        self.tx_blst = {}  # current Tx bursts per CPU
+        self.tx_qlst = {}  # active Tx queues per port/queue
+        self.tx_wlst = {}  # wait timestamp list per CPU
+
+    def run(self, msg_it):
+        """Run over gathered tracing data and build database"""
+        for msg in msg_it:
+            if not isinstance(msg, bt2._EventMessageConst):
+                continue
+            event = msg.event
+            if event.name.startswith(PFX_TX):
+                do_tx(msg, self)
+            # Handling of other log event cathegories can be added here
+
+    def log(self):
+        """Log gathered trace database"""
+        for pq_id in self.tx_qlst:
+            queue = self.tx_qlst.get(pq_id)
+            queue.log()
+
+
+def do_tx_entry(msg, trace):
+    """Handle entry Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = MlxBurst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    trace.tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = MlxQueue()
+        queue.pq_id = pq_id
+        trace.tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg, trace):
+    """Handle exit Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    trace.tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg, trace):
+    """Handle WQE record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = MlxWqe()
+    wqe.wait_ts = trace.tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg, trace):
+    """Handle WAIT record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    trace.tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg, trace):
+    """Handle WQE push event"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = MlxMbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg, trace):
+    """Handle send completion event"""
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    wqe_ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, wqe_ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg, trace):
+    """Handle Tx related records"""
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg, trace)
+    elif name == "exit":
+        do_tx_exit(msg, trace)
+    elif name == "wqe":
+        do_tx_wqe(msg, trace)
+    elif name == "wait":
+        do_tx_wait(msg, trace)
+    elif name == "push":
+        do_tx_push(msg, trace)
+    elif name == "complete":
+        do_tx_complete(msg, trace)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name, file=sys.stderr)
+        raise ValueError()
+
+
+def main() -> int:
+    """Script entry point"""
+    try:
+        parser = argparse.ArgumentParser()
+        parser.add_argument("path", nargs=1, type=str, help="input trace folder")
+        args = parser.parse_args()
+
+        mlx_tr = MlxTrace()
+        msg_it = bt2.TraceCollectionMessageIterator(args.path)
+        mlx_tr.run(msg_it)
+        mlx_tr.log()
+        return 0
+    except ValueError:
+        return -1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 4/4] doc: add mlx5 datapath tracing feature description
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2023-07-05 11:10   ` [PATCH v4 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-07-05 11:10   ` Viacheslav Ovsiienko
  3 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 11:10 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

The patch adds the documentation for feature usage.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 78 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b9843edbd9..1c8fc6f6d4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2077,3 +2077,81 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+
+Tx datapath tracing
+^^^^^^^^^^^^^^^^^^^
+
+The mlx5 provides the Tx datapath tracing capability with extra debug
+information - when and how packets were scheduled and when the actual
+sending was completed by the NIC hardware. The feature engages the
+existing DPDK datapath tracing capability.
+
+Usage of the mlx5 Tx datapath tracing:
+
+#. Build DPDK application with enabled datapath tracking
+
+   * The meson option should be specified: ``--enable_trace_fp=true``
+   * The c_args should be specified:  ``-DALLOW_EXPERIMENTAL_API``
+
+   .. code-block:: console
+
+      meson configure --buildtype=debug -Denable_trace_fp=true
+         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+      meson configure --buildtype=release -Denable_trace_fp=true
+         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+#. Configure the NIC
+
+   If the sending completion timings are important the NIC should be configured
+   to provide realtime timestamps, the ``REAL_TIME_CLOCK_ENABLE`` NV settings
+   parameter should be configured as TRUE.
+
+   .. code-block:: console
+
+      mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1
+
+#. Run application with EAL parameters configuring the tracing in mlx5 Tx datapath
+
+    * ``--trace=pmd.net.mlx5.tx`` - the regular expression enabling the tracepoints
+      with matching names at least "pmd.net.mlx5.tx" must be enabled to gather all
+      events needed to analyze mlx5 Tx datapath and its timings. By default all
+      tracepoints are disabled.
+
+#. Store the file with gathered tracing information
+
+#. Install or build the ``babeltrace2`` package
+
+   The gathered trace data can be analyzed with a developed Python script.
+   To parse the trace, the data script uses the ``babeltrace2`` library.
+   The package should be either installed or built from source code as
+   shown below.
+
+   .. code-block:: console
+
+      git clone https://github.com/efficios/babeltrace.git
+      cd babeltrace
+      ./bootstrap
+      ./configure -help
+      ./configure --disable-api-doc --disable-man-pages
+                  --disable-python-bindings-doc --enable-python-plugins
+                  --enable-python-binding
+
+#. Run analyzing scrypt (in Python) to combine related events (packet firing and
+   completion) and see the output in human-readable view
+
+   The analyzing script is located in the folder: ``./drivers/net/mlx5/tools``
+   It requires Python3.6, ``babeltrace2`` packages and it takes the only parameter
+   of trace data file.
+
+   .. code-block:: console
+
+      ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
+
+#. Interpreting the Script Output Data
+
+   All the timings are given in nanoseconds.
+   The list of Tx bursts per port/queue is presented in the output.
+   Each list element contains the list of built WQEs with specific opcodes, and
+   each WQE contains the list of the encompassed packets to send.
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (12 preceding siblings ...)
  2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-07-05 15:31 ` Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
                     ` (4 more replies)
  2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
  14 siblings, 5 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 15:31 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

v3: - tracepoint routines are moved to the net folder, no need to export
    - documentation added
    - testpmd patches moved out from series to the dedicated patches

v4: - Python comments addressed
    - codestyle issues fixed

v5: - traces are moved to the dedicated files, otherwise registration
      header caused wrong code generation for 3rd party files/objects
      and resulted in performance drop

Viacheslav Ovsiienko (4):
  net/mlx5: introduce tracepoints for mlx5 drivers
  net/mlx5: add comprehensive send completion trace
  net/mlx5: add Tx datapath trace analyzing script
  doc: add mlx5 datapath tracing feature description

 doc/guides/nics/mlx5.rst             |  78 +++++++
 drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
 drivers/net/mlx5/meson.build         |   1 +
 drivers/net/mlx5/mlx5_devx.c         |   8 +-
 drivers/net/mlx5/mlx5_rx.h           |  19 --
 drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
 drivers/net/mlx5/mlx5_trace.c        |  25 +++
 drivers/net/mlx5/mlx5_trace.h        |  73 +++++++
 drivers/net/mlx5/mlx5_tx.c           |   9 +
 drivers/net/mlx5/mlx5_tx.h           |  89 +++++++-
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 11 files changed, 607 insertions(+), 29 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_trace.c
 create mode 100644 drivers/net/mlx5/mlx5_trace.h
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5 1/4] net/mlx5: introduce tracepoints for mlx5 drivers
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-07-05 15:31   ` Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 15:31 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

There is an intention to engage DPDK tracing capabilities
for mlx5 PMDs monitoring and profiling in various modes.
The patch introduces tracepoints for the Tx datapath in
the ethernet device driver.

To engage this tracing capability the following steps
should be taken:

- meson option -Denable_trace_fp=true
- meson option -Dc_args='-DALLOW_EXPERIMENTAL_API'
- EAL command line parameter --trace=pmd.net.mlx5.tx.*

The Tx datapath tracing allows to get information how packets
are pushed into hardware descriptors, time stamping for
scheduled wait and send completions, etc.

To provide the human readable form of trace results the
dedicated post-processing script is presumed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/meson.build  |  1 +
 drivers/net/mlx5/mlx5_rx.h    | 19 ---------
 drivers/net/mlx5/mlx5_rxtx.h  | 19 +++++++++
 drivers/net/mlx5/mlx5_trace.c | 25 ++++++++++++
 drivers/net/mlx5/mlx5_trace.h | 73 +++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c    |  9 +++++
 drivers/net/mlx5/mlx5_tx.h    | 26 ++++++++++++-
 7 files changed, 151 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_trace.c
 create mode 100644 drivers/net/mlx5/mlx5_trace.h

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index bcb9c8542f..69771c63ab 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -31,6 +31,7 @@ sources = files(
         'mlx5_rxtx.c',
         'mlx5_stats.c',
         'mlx5_trigger.c',
+        'mlx5_trace.c',
         'mlx5_tx.c',
         'mlx5_tx_empw.c',
         'mlx5_tx_mpw.c',
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3514edd84e..f42607dce4 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -377,25 +377,6 @@ mlx5_rx_mb2mr(struct mlx5_rxq_data *rxq, struct rte_mbuf *mb)
 	return mlx5_mr_mempool2mr_bh(mr_ctrl, mb->pool, addr);
 }
 
-/**
- * Convert timestamp from HW format to linear counter
- * from Packet Pacing Clock Queue CQE timestamp format.
- *
- * @param sh
- *   Pointer to the device shared context. Might be needed
- *   to convert according current device configuration.
- * @param ts
- *   Timestamp from CQE to convert.
- * @return
- *   UTC in nanoseconds
- */
-static __rte_always_inline uint64_t
-mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
-{
-	RTE_SET_USED(sh);
-	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
-}
-
 /**
  * Set timestamp in mbuf dynamic field.
  *
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 876aa14ae6..b109d50758 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -43,4 +43,23 @@ int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
 int mlx5_queue_state_modify(struct rte_eth_dev *dev,
 			    struct mlx5_mp_arg_queue_state_modify *sm);
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_trace.c b/drivers/net/mlx5/mlx5_trace.c
new file mode 100644
index 0000000000..bbbfd9178c
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_trace.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_trace_point_register.h>
+#include <mlx5_trace.h>
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_entry,
+	pmd.net.mlx5.tx.entry)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_exit,
+	pmd.net.mlx5.tx.exit)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wqe,
+	pmd.net.mlx5.tx.wqe)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_wait,
+	pmd.net.mlx5.tx.wait)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_push,
+	pmd.net.mlx5.tx.push)
+
+RTE_TRACE_POINT_REGISTER(rte_pmd_mlx5_trace_tx_complete,
+	pmd.net.mlx5.tx.complete)
diff --git a/drivers/net/mlx5/mlx5_trace.h b/drivers/net/mlx5/mlx5_trace.h
new file mode 100644
index 0000000000..888d96f60b
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_trace.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_PMD_MLX5_TRACE_H_
+#define RTE_PMD_MLX5_TRACE_H_
+
+/**
+ * @file
+ *
+ * API for mlx5 PMD trace support
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <mlx5_prm.h>
+#include <rte_mbuf.h>
+#include <rte_trace_point.h>
+
+/* TX burst subroutines trace points. */
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_entry,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_exit,
+	RTE_TRACE_POINT_ARGS(uint16_t nb_sent, uint16_t nb_req),
+	rte_trace_point_emit_u16(nb_sent);
+	rte_trace_point_emit_u16(nb_req);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wqe,
+	RTE_TRACE_POINT_ARGS(uint32_t opcode),
+	rte_trace_point_emit_u32(opcode);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_wait,
+	RTE_TRACE_POINT_ARGS(uint64_t ts),
+	rte_trace_point_emit_u64(ts);
+)
+
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_push,
+	RTE_TRACE_POINT_ARGS(const struct rte_mbuf *mbuf, uint16_t wqe_id),
+	rte_trace_point_emit_ptr(mbuf);
+	rte_trace_point_emit_u32(mbuf->pkt_len);
+	rte_trace_point_emit_u16(mbuf->nb_segs);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_pmd_mlx5_trace_tx_complete,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
+			     uint16_t wqe_id, uint64_t ts),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_id);
+	rte_trace_point_emit_u64(ts);
+	rte_trace_point_emit_u16(wqe_id);
+)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_PMD_MLX5_TRACE_H_ */
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 14e1487e59..1fe9521dfc 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -232,6 +232,15 @@ mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,
 		MLX5_ASSERT((txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16) ==
 			    cqe->wqe_counter);
 #endif
+		if (__rte_trace_point_fp_is_enabled()) {
+			uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+			uint16_t wqe_id = rte_be_to_cpu_16(cqe->wqe_counter);
+
+			if (txq->rt_timestamp)
+				ts = mlx5_txpp_convert_rx_ts(NULL, ts);
+			rte_pmd_mlx5_trace_tx_complete(txq->port_id, txq->idx,
+						       wqe_id, ts);
+		}
 		ring_doorbell = true;
 		++txq->cq_ci;
 		last_cqe = cqe;
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index cc8f7e98aa..5df0c4a794 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -13,12 +13,15 @@
 #include <rte_mempool.h>
 #include <rte_common.h>
 #include <rte_spinlock.h>
+#include <rte_trace_point.h>
 
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_trace.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
@@ -764,6 +767,9 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
 			     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
+	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
+		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
+	rte_pmd_mlx5_trace_tx_wqe((txq->wqe_ci << 8) | opcode);
 }
 
 /**
@@ -1692,6 +1698,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 		if (txq->wait_on_time) {
 			/* The wait on time capability should be used. */
 			ts -= sh->txpp.skew;
+			rte_pmd_mlx5_trace_tx_wait(ts);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_wseg) /
 					      MLX5_WSEG_SIZE,
@@ -1706,6 +1713,7 @@ mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
 			if (unlikely(wci < 0))
 				return MLX5_TXCMP_CODE_SINGLE;
 			/* Build the WAIT WQE with specified completion. */
+			rte_pmd_mlx5_trace_tx_wait(ts - sh->txpp.skew);
 			mlx5_tx_cseg_init(txq, loc, wqe,
 					  1 + sizeof(struct mlx5_wqe_qseg) /
 					      MLX5_WSEG_SIZE,
@@ -1810,6 +1818,7 @@ mlx5_tx_packet_multi_tso(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_TSO, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 1, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -1892,6 +1901,7 @@ mlx5_tx_packet_multi_send(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	mlx5_tx_eseg_none(txq, loc, wqe, olx);
 	dseg = &wqe->dseg[0];
 	do {
@@ -2115,6 +2125,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data *__rte_restrict txq,
 	wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 	loc->wqe_last = wqe;
 	mlx5_tx_cseg_init(txq, loc, wqe, 0, MLX5_OPCODE_SEND, olx);
+	rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 	ds = mlx5_tx_mseg_build(txq, loc, wqe, vlan, inlen, 0, olx);
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
@@ -2318,8 +2329,8 @@ mlx5_tx_burst_tso(struct mlx5_txq_data *__rte_restrict txq,
 		 */
 		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 		loc->wqe_last = wqe;
-		mlx5_tx_cseg_init(txq, loc, wqe, ds,
-				  MLX5_OPCODE_TSO, olx);
+		mlx5_tx_cseg_init(txq, loc, wqe, ds, MLX5_OPCODE_TSO, olx);
+		rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 		dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan, hlen, 1, olx);
 		dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) + hlen - vlan;
 		dlen -= hlen - vlan;
@@ -2688,6 +2699,7 @@ mlx5_tx_burst_empw_simple(struct mlx5_txq_data *__rte_restrict txq,
 			/* Update sent data bytes counter. */
 			slen += dlen;
 #endif
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr
 				(txq, loc, dseg,
 				 rte_pktmbuf_mtod(loc->mbuf, uint8_t *),
@@ -2926,6 +2938,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 				tlen += sizeof(struct rte_vlan_hdr);
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_vlan(txq, loc, dseg,
 							 dptr, dlen, olx);
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -2935,6 +2948,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			} else {
 				if (room < tlen)
 					break;
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_dseg_empw(txq, loc, dseg,
 							 dptr, dlen, olx);
 			}
@@ -2980,6 +2994,7 @@ mlx5_tx_burst_empw_inline(struct mlx5_txq_data *__rte_restrict txq,
 			if (MLX5_TXOFF_CONFIG(VLAN))
 				MLX5_ASSERT(!(loc->mbuf->ol_flags &
 					    RTE_MBUF_F_TX_VLAN));
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_dseg_ptr(txq, loc, dseg, dptr, dlen, olx);
 			/* We have to store mbuf in elts.*/
 			txq->elts[txq->elts_head++ & txq->elts_m] = loc->mbuf;
@@ -3194,6 +3209,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, seg_n,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_data(txq, loc, wqe,
 						  vlan, inlen, 0, olx);
 				txq->wqe_ci += wqe_n;
@@ -3256,6 +3272,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, ds,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				dseg = mlx5_tx_eseg_data(txq, loc, wqe, vlan,
 							 txq->inlen_mode,
 							 0, olx);
@@ -3297,6 +3314,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
 						  MLX5_OPCODE_SEND, olx);
+				rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 				mlx5_tx_eseg_dmin(txq, loc, wqe, vlan, olx);
 				dptr = rte_pktmbuf_mtod(loc->mbuf, uint8_t *) +
 				       MLX5_ESEG_MIN_INLINE_SIZE - vlan;
@@ -3338,6 +3356,7 @@ mlx5_tx_burst_single_send(struct mlx5_txq_data *__rte_restrict txq,
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
 					  MLX5_OPCODE_SEND, olx);
+			rte_pmd_mlx5_trace_tx_push(loc->mbuf, txq->wqe_ci);
 			mlx5_tx_eseg_none(txq, loc, wqe, olx);
 			mlx5_tx_dseg_ptr
 				(txq, loc, &wqe->dseg[0],
@@ -3707,6 +3726,9 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 #endif
 	if (MLX5_TXOFF_CONFIG(INLINE) && loc.mbuf_free)
 		__mlx5_tx_free_mbuf(txq, pkts, loc.mbuf_free, olx);
+	/* Trace productive bursts only. */
+	if (__rte_trace_point_fp_is_enabled() && loc.pkts_sent)
+		rte_pmd_mlx5_trace_tx_exit(loc.pkts_sent, pkts_n);
 	return loc.pkts_sent;
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5 2/4] net/mlx5: add comprehensive send completion trace
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
@ 2023-07-05 15:31   ` Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 15:31 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

There is the demand to trace the send completions of
every WQE if time scheduling is enabled.

The patch extends the size of completion queue and
requests completion on every issued WQE in the
send queue. As the result hardware provides CQE on
each completed WQE and driver is able to fetch
completion timestamp for dedicated operation.

The add code is under conditional compilation
RTE_ENABLE_TRACE_FP flag and does not impact the
release code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_verbs.c |  8 +++-
 drivers/net/mlx5/mlx5_devx.c        |  8 +++-
 drivers/net/mlx5/mlx5_tx.h          | 63 +++++++++++++++++++++++++++--
 3 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c
index 7233c2c7fa..b54f3ccd9a 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -968,8 +968,12 @@ mlx5_txq_ibv_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	cqe_n = desc / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = desc / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	txq_obj->cq = mlx5_glue->create_cq(priv->sh->cdev->ctx, cqe_n,
 					   NULL, NULL, 0);
 	if (txq_obj->cq == NULL) {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4369d2557e..5082a7e178 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1465,8 +1465,12 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 	MLX5_ASSERT(ppriv);
 	txq_obj->txq_ctrl = txq_ctrl;
 	txq_obj->dev = dev;
-	cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
-		1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	if (__rte_trace_point_fp_is_enabled() &&
+	    txq_data->offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+		cqe_n = UINT16_MAX / 2 - 1;
+	else
+		cqe_n = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+			1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
 	log_desc_n = log2above(cqe_n);
 	cqe_n = 1UL << log_desc_n;
 	if (cqe_n > UINT16_MAX) {
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 5df0c4a794..264cc192dc 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -729,6 +729,54 @@ mlx5_tx_request_completion(struct mlx5_txq_data *__rte_restrict txq,
 	}
 }
 
+/**
+ * Set completion request flag for all issued WQEs.
+ * This routine is intended to be used with enabled fast path tracing
+ * and send scheduling on time to provide the detailed report in trace
+ * for send completions on every WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_request_completion_trace(struct mlx5_txq_data *__rte_restrict txq,
+				 struct mlx5_txq_local *__rte_restrict loc,
+				 unsigned int olx)
+{
+	uint16_t head = txq->elts_comp;
+
+	while (txq->wqe_comp != txq->wqe_ci) {
+		volatile struct mlx5_wqe *wqe;
+		uint32_t wqe_n;
+
+		MLX5_ASSERT(loc->wqe_last);
+		wqe = txq->wqes + (txq->wqe_comp & txq->wqe_m);
+		if (wqe == loc->wqe_last) {
+			head = txq->elts_head;
+			head +=	MLX5_TXOFF_CONFIG(INLINE) ?
+				0 : loc->pkts_sent - loc->pkts_copy;
+			txq->elts_comp = head;
+		}
+		/* Completion request flag was set on cseg constructing. */
+#ifdef RTE_LIBRTE_MLX5_DEBUG
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head |
+			  (wqe->cseg.opcode >> 8) << 16;
+#else
+		txq->fcqs[txq->cq_pi++ & txq->cqe_m] = head;
+#endif
+		/* A CQE slot must always be available. */
+		MLX5_ASSERT((txq->cq_pi - txq->cq_ci) <= txq->cqe_s);
+		/* Advance to the next WQE in the queue. */
+		wqe_n = rte_be_to_cpu_32(wqe->cseg.sq_ds) & 0x3F;
+		txq->wqe_comp += RTE_ALIGN(wqe_n, 4) / 4;
+	}
+}
+
 /**
  * Build the Control Segment with specified opcode:
  * - MLX5_OPCODE_SEND
@@ -755,7 +803,7 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		  struct mlx5_wqe *__rte_restrict wqe,
 		  unsigned int ds,
 		  unsigned int opcode,
-		  unsigned int olx __rte_unused)
+		  unsigned int olx)
 {
 	struct mlx5_wqe_cseg *__rte_restrict cs = &wqe->cseg;
 
@@ -764,8 +812,12 @@ mlx5_tx_cseg_init(struct mlx5_txq_data *__rte_restrict txq,
 		opcode = MLX5_OPCODE_TSO | MLX5_OPC_MOD_MPW << 24;
 	cs->opcode = rte_cpu_to_be_32((txq->wqe_ci << 8) | opcode);
 	cs->sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
-	cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-			     MLX5_COMP_MODE_OFFSET);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+	else
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
 	cs->misc = RTE_BE32(0);
 	if (__rte_trace_point_fp_is_enabled() && !loc->pkts_sent)
 		rte_pmd_mlx5_trace_tx_entry(txq->port_id, txq->idx);
@@ -3663,7 +3715,10 @@ mlx5_tx_burst_tmpl(struct mlx5_txq_data *__rte_restrict txq,
 	if (unlikely(loc.pkts_sent == loc.pkts_loop))
 		goto burst_exit;
 	/* Request CQE generation if limits are reached. */
-	mlx5_tx_request_completion(txq, &loc, olx);
+	if (MLX5_TXOFF_CONFIG(TXPP) && __rte_trace_point_fp_is_enabled())
+		mlx5_tx_request_completion_trace(txq, &loc, olx);
+	else
+		mlx5_tx_request_completion(txq, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5 3/4] net/mlx5: add Tx datapath trace analyzing script
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
@ 2023-07-05 15:31   ` Viacheslav Ovsiienko
  2023-07-05 15:31   ` [PATCH v5 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  2023-07-06 16:27   ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 15:31 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..8c1fd0a350
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,307 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+"""
+Analyzing the mlx5 PMD datapath tracings
+"""
+import sys
+import argparse
+import bt2
+
+PFX_TX = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+
+class MlxQueue:
+    """Queue container object"""
+
+    def __init__(self):
+        self.done_burst = []  # completed bursts
+        self.wait_burst = []  # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        """Log all queue bursts"""
+        for txb in self.done_burst:
+            txb.log()
+
+
+class MlxMbuf:
+    """Packet mbufs container object"""
+
+    def __init__(self):
+        self.wqe = 0     # wqe id
+        self.ptr = None  # first packet mbuf pointer
+        self.len = 0     # packet data length
+        self.nseg = 0    # number of segments
+
+    def log(self):
+        """Log mbuf"""
+        out_txt = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out_txt += " (%d segs)" % self.nseg
+        print(out_txt)
+
+
+class MlxWqe:
+    """WQE container object"""
+
+    def __init__(self):
+        self.mbuf = []    # list of mbufs in WQE
+        self.wait_ts = 0  # preceding wait/push timestamp
+        self.comp_ts = 0  # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        """Log WQE"""
+        wqe_id = (self.opcode >> 8) & 0xFFFF
+        wqe_op = self.opcode & 0xFF
+        out_txt = "  %04X: " % wqe_id
+        if wqe_op == 0xF:
+            out_txt += "WAIT"
+        elif wqe_op == 0x29:
+            out_txt += "EMPW"
+        elif wqe_op == 0xE:
+            out_txt += "TSO "
+        elif wqe_op == 0xA:
+            out_txt += "SEND"
+        else:
+            out_txt += "0x%02X" % wqe_op
+        if self.comp_ts != 0:
+            out_txt += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out_txt += " (%d)" % self.wait_ts
+        print(out_txt)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if WQE in not completedLog WQE"""
+        if self.comp_ts != 0:
+            return 1
+        cur_id = (self.opcode >> 8) & 0xFFFF
+        if cur_id > wqe_id:
+            cur_id -= wqe_id
+            if cur_id <= 0x8000:
+                return 0
+        else:
+            cur_id = wqe_id - cur_id
+            if cur_id >= 0x8000:
+                return 0
+        self.comp_ts = wqe_ts
+        return 1
+
+
+class MlxBurst:
+    """Packet burst container object"""
+
+    def __init__(self):
+        self.wqes = []    # issued burst WQEs
+        self.done = 0     # number of sent/recv packets
+        self.req = 0      # requested number of packets
+        self.call_ts = 0  # burst routine invocation
+        self.done_ts = 0  # burst routine done
+        self.queue = None
+
+    def log(self):
+        """Log burst"""
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)"
+                % (self.call_ts, port, queue, self.done, self.req)
+            )
+        else:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts in %u"
+                % (
+                    self.call_ts,
+                    port,
+                    queue,
+                    self.done,
+                    self.req,
+                    self.done_ts - self.call_ts,
+                )
+            )
+        for wqe in self.wqes:
+            wqe.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if not all of WQEs in burst completed"""
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, wqe_ts) == 0:
+                return 0
+        return 1
+
+
+class MlxTrace:
+    """Trace representing object"""
+
+    def __init__(self):
+        self.tx_blst = {}  # current Tx bursts per CPU
+        self.tx_qlst = {}  # active Tx queues per port/queue
+        self.tx_wlst = {}  # wait timestamp list per CPU
+
+    def run(self, msg_it):
+        """Run over gathered tracing data and build database"""
+        for msg in msg_it:
+            if not isinstance(msg, bt2._EventMessageConst):
+                continue
+            event = msg.event
+            if event.name.startswith(PFX_TX):
+                do_tx(msg, self)
+            # Handling of other log event cathegories can be added here
+
+    def log(self):
+        """Log gathered trace database"""
+        for pq_id in self.tx_qlst:
+            queue = self.tx_qlst.get(pq_id)
+            queue.log()
+
+
+def do_tx_entry(msg, trace):
+    """Handle entry Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = MlxBurst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    trace.tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = MlxQueue()
+        queue.pq_id = pq_id
+        trace.tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg, trace):
+    """Handle exit Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    trace.tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg, trace):
+    """Handle WQE record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = MlxWqe()
+    wqe.wait_ts = trace.tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg, trace):
+    """Handle WAIT record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    trace.tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg, trace):
+    """Handle WQE push event"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = MlxMbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg, trace):
+    """Handle send completion event"""
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    wqe_ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, wqe_ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg, trace):
+    """Handle Tx related records"""
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg, trace)
+    elif name == "exit":
+        do_tx_exit(msg, trace)
+    elif name == "wqe":
+        do_tx_wqe(msg, trace)
+    elif name == "wait":
+        do_tx_wait(msg, trace)
+    elif name == "push":
+        do_tx_push(msg, trace)
+    elif name == "complete":
+        do_tx_complete(msg, trace)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name, file=sys.stderr)
+        raise ValueError()
+
+
+def main() -> int:
+    """Script entry point"""
+    try:
+        parser = argparse.ArgumentParser()
+        parser.add_argument("path", nargs=1, type=str, help="input trace folder")
+        args = parser.parse_args()
+
+        mlx_tr = MlxTrace()
+        msg_it = bt2.TraceCollectionMessageIterator(args.path)
+        mlx_tr.run(msg_it)
+        mlx_tr.log()
+        return 0
+    except ValueError:
+        return -1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v5 4/4] doc: add mlx5 datapath tracing feature description
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2023-07-05 15:31   ` [PATCH v5 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-07-05 15:31   ` Viacheslav Ovsiienko
  2023-07-06 16:27   ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
  4 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-05 15:31 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

The patch adds the documentation for feature usage.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 78 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b9843edbd9..1c8fc6f6d4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2077,3 +2077,81 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+
+Tx datapath tracing
+^^^^^^^^^^^^^^^^^^^
+
+The mlx5 provides the Tx datapath tracing capability with extra debug
+information - when and how packets were scheduled and when the actual
+sending was completed by the NIC hardware. The feature engages the
+existing DPDK datapath tracing capability.
+
+Usage of the mlx5 Tx datapath tracing:
+
+#. Build DPDK application with enabled datapath tracking
+
+   * The meson option should be specified: ``--enable_trace_fp=true``
+   * The c_args should be specified:  ``-DALLOW_EXPERIMENTAL_API``
+
+   .. code-block:: console
+
+      meson configure --buildtype=debug -Denable_trace_fp=true
+         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+      meson configure --buildtype=release -Denable_trace_fp=true
+         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+#. Configure the NIC
+
+   If the sending completion timings are important the NIC should be configured
+   to provide realtime timestamps, the ``REAL_TIME_CLOCK_ENABLE`` NV settings
+   parameter should be configured as TRUE.
+
+   .. code-block:: console
+
+      mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1
+
+#. Run application with EAL parameters configuring the tracing in mlx5 Tx datapath
+
+    * ``--trace=pmd.net.mlx5.tx`` - the regular expression enabling the tracepoints
+      with matching names at least "pmd.net.mlx5.tx" must be enabled to gather all
+      events needed to analyze mlx5 Tx datapath and its timings. By default all
+      tracepoints are disabled.
+
+#. Store the file with gathered tracing information
+
+#. Install or build the ``babeltrace2`` package
+
+   The gathered trace data can be analyzed with a developed Python script.
+   To parse the trace, the data script uses the ``babeltrace2`` library.
+   The package should be either installed or built from source code as
+   shown below.
+
+   .. code-block:: console
+
+      git clone https://github.com/efficios/babeltrace.git
+      cd babeltrace
+      ./bootstrap
+      ./configure -help
+      ./configure --disable-api-doc --disable-man-pages
+                  --disable-python-bindings-doc --enable-python-plugins
+                  --enable-python-binding
+
+#. Run analyzing scrypt (in Python) to combine related events (packet firing and
+   completion) and see the output in human-readable view
+
+   The analyzing script is located in the folder: ``./drivers/net/mlx5/tools``
+   It requires Python3.6, ``babeltrace2`` packages and it takes the only parameter
+   of trace data file.
+
+   .. code-block:: console
+
+      ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
+
+#. Interpreting the Script Output Data
+
+   All the timings are given in nanoseconds.
+   The list of Tx bursts per port/queue is presented in the output.
+   Each list element contains the list of built WQEs with specific opcodes, and
+   each WQE contains the list of the encompassed packets to send.
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2023-07-05 15:31   ` [PATCH v5 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
@ 2023-07-06 16:27   ` Raslan Darawsheh
  4 siblings, 0 replies; 76+ messages in thread
From: Raslan Darawsheh @ 2023-07-06 16:27 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: jerinj

Hi,

> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Wednesday, July 5, 2023 6:31 PM
> To: dev@dpdk.org
> Cc: jerinj@marvell.com; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing
> 
> The mlx5 provides the send scheduling on specific moment of time, and for
> the related kind of applications it would be extremely useful to have extra
> debug information - when and how packets were scheduled and when the
> actual sending was completed by the NIC hardware (it helps application to
> track the internal delay issues).
> 
> Because the DPDK tx datapath API does not suppose getting any feedback
> from the driver and the feature looks like to be mlx5 specific, it seems to be
> reasonable to engage exisiting DPDK datapath tracing capability.
> 
> The work cycle is supposed to be:
>   - compile appplication with enabled tracing
>   - run application with EAL parameters configuring the tracing in mlx5
>     Tx datapath
>   - store the dump file with gathered tracing information
>   - run analyzing scrypt (in Python) to combine related events (packet
>     firing and completion) and see the data in human-readable view
> 
> Below is the detailed instruction "how to" with mlx5 NIC to gather all the
> debug data including the full timings information.
> 
> 
> 1. Build DPDK application with enabled datapath tracing
> 
> The meson option should be specified:
>    --enable_trace_fp=true
> 
> The c_args shoudl be specified:
>    -DALLOW_EXPERIMENTAL_API
> 
> The DPDK configuration examples:
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=debug -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
> 
>   meson configure --buildtype=release -Denable_trace_fp=true
>         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> 
> 
> 2. Configuring the NIC
> 
> If the sending completion timings are important the NIC should be configured
> to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings
> parameter should be configured to TRUE, for example with command (and
> with following FW/driver reset):
> 
>   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> REAL_TIME_CLOCK_ENABLE=1
> 
> 
> 3. Run DPDK application to gather the traces
> 
> EAL parameters controlling trace capability in runtime
> 
>   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
>                             with matching names at least "pmd.net.mlx5.tx"
>                             must be enabled to gather all events needed
>                             to analyze mlx5 Tx datapath and its timings.
>                             By default all tracepoints are disabled.
> 
>   --trace-dir=/var/log - trace storing directory
> 
>   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
>                                        per thread. The default is 1MB.
> 
>   --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.
> 
> 
> 4. Installing or Building Babeltrace2 Package
> 
> The gathered trace data can be analyzed with a developed Python script.
> To parse the trace, the data script uses the Babeltrace2 library.
> The package should be either installed or built from source code as shown
> below:
> 
>   git clone https://github.com/efficios/babeltrace.git
>   cd babeltrace
>   ./bootstrap
>   ./configure -help
>   ./configure --disable-api-doc --disable-man-pages
>               --disable-python-bindings-doc --enbale-python-plugins
>               --enable-python-binding
> 
> 5. Running the Analyzing Script
> 
> The analyzing script is located in the folder: ./drivers/net/mlx5/tools It requires
> Python3.6, Babeltrace2 packages and it takes the only parameter of trace data
> file. For example:
> 
>    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> 
> 
> 6. Interpreting the Script Output Data
> 
> All the timings are given in nanoseconds.
> The list of Tx (and coming Rx) bursts per port/queue is presented in the
> output.
> Each list element contains the list of built WQEs with specific opcodes, and
> each WQE contains the list of the encompassed packets to send.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> --
> v2: - comment addressed: "dump_trace" command is replaced with
> "save_trace"
>     - Windows build failure addressed, Windows does not support tracing
> 
> v3: - tracepoint routines are moved to the net folder, no need to export
>     - documentation added
>     - testpmd patches moved out from series to the dedicated patches
> 
> v4: - Python comments addressed
>     - codestyle issues fixed
> 
> v5: - traces are moved to the dedicated files, otherwise registration
>       header caused wrong code generation for 3rd party files/objects
>       and resulted in performance drop
> 
> Viacheslav Ovsiienko (4):
>   net/mlx5: introduce tracepoints for mlx5 drivers
>   net/mlx5: add comprehensive send completion trace
>   net/mlx5: add Tx datapath trace analyzing script
>   doc: add mlx5 datapath tracing feature description
> 
>  doc/guides/nics/mlx5.rst             |  78 +++++++
>  drivers/net/mlx5/linux/mlx5_verbs.c  |   8 +-
>  drivers/net/mlx5/meson.build         |   1 +
>  drivers/net/mlx5/mlx5_devx.c         |   8 +-
>  drivers/net/mlx5/mlx5_rx.h           |  19 --
>  drivers/net/mlx5/mlx5_rxtx.h         |  19 ++
>  drivers/net/mlx5/mlx5_trace.c        |  25 +++
>  drivers/net/mlx5/mlx5_trace.h        |  73 +++++++
>  drivers/net/mlx5/mlx5_tx.c           |   9 +
>  drivers/net/mlx5/mlx5_tx.h           |  89 +++++++-
>  drivers/net/mlx5/tools/mlx5_trace.py | 307
> +++++++++++++++++++++++++++
>  11 files changed, 607 insertions(+), 29 deletions(-)  create mode 100644
> drivers/net/mlx5/mlx5_trace.c  create mode 100644
> drivers/net/mlx5/mlx5_trace.h  create mode 100755
> drivers/net/mlx5/tools/mlx5_trace.py
> 
> --
> 2.18.1


Applied first two patches to next-net-mlx,

Script + doc will be considered for RC4 

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 0/2] net/mlx5: introduce Tx datapath tracing
  2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
                   ` (13 preceding siblings ...)
  2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
@ 2023-07-11 15:15 ` Viacheslav Ovsiienko
  2023-07-11 15:15   ` [PATCH v6 1/2] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
                     ` (2 more replies)
  14 siblings, 3 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-11 15:15 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

Because the DPDK tx datapath API does not suppose getting any feedback
from the driver and the feature looks like to be mlx5 specific, it seems
to be reasonable to engage exisiting DPDK datapath tracing capability.

The work cycle is supposed to be:
  - compile appplication with enabled tracing
  - run application with EAL parameters configuring the tracing in mlx5
    Tx datapath
  - store the dump file with gathered tracing information
  - run analyzing scrypt (in Python) to combine related events (packet
    firing and completion) and see the data in human-readable view

Below is the detailed instruction "how to" with mlx5 NIC to gather
all the debug data including the full timings information.


1. Build DPDK application with enabled datapath tracing

The meson option should be specified:
   --enable_trace_fp=true

The c_args shoudl be specified:
   -DALLOW_EXPERIMENTAL_API

The DPDK configuration examples:

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=debug -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build

  meson configure --buildtype=release -Denable_trace_fp=true
        -Dc_args='-DALLOW_EXPERIMENTAL_API' build


2. Configuring the NIC

If the sending completion timings are important the NIC should be configured
to provide realtime timestamps, the REAL_TIME_CLOCK_ENABLE NV settings parameter
should be configured to TRUE, for example with command (and with following
FW/driver reset):

  sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1


3. Run DPDK application to gather the traces

EAL parameters controlling trace capability in runtime

  --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
                            with matching names at least "pmd.net.mlx5.tx"
                            must be enabled to gather all events needed
                            to analyze mlx5 Tx datapath and its timings.
                            By default all tracepoints are disabled.

  --trace-dir=/var/log - trace storing directory

  --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
                                       per thread. The default is 1MB.

  --trace-mode=overwrite|discard  - optional, selects trace data buffer mode.


4. Installing or Building Babeltrace2 Package

The gathered trace data can be analyzed with a developed Python script.
To parse the trace, the data script uses the Babeltrace2 library.
The package should be either installed or built from source code as
shown below:

  git clone https://github.com/efficios/babeltrace.git
  cd babeltrace
  ./bootstrap
  ./configure -help
  ./configure --disable-api-doc --disable-man-pages
              --disable-python-bindings-doc --enbale-python-plugins
              --enable-python-binding

5. Running the Analyzing Script

The analyzing script is located in the folder: ./drivers/net/mlx5/tools
It requires Python3.6, Babeltrace2 packages and it takes the only parameter
of trace data file. For example:

   ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39


6. Interpreting the Script Output Data

All the timings are given in nanoseconds.
The list of Tx (and coming Rx) bursts per port/queue is presented in the output.
Each list element contains the list of built WQEs with specific opcodes, and
each WQE contains the list of the encompassed packets to send.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

--
v2: - comment addressed: "dump_trace" command is replaced with "save_trace"
    - Windows build failure addressed, Windows does not support tracing

v3: - tracepoint routines are moved to the net folder, no need to export
    - documentation added
    - testpmd patches moved out from series to the dedicated patches

v4: - Python comments addressed
    - codestyle issues fixed

v5: - traces are moved to the dedicated files, otherwise registration
      header caused wrong code generation for 3rd party files/objects
      and resulted in performance drop

v6: - documentation reworded

Viacheslav Ovsiienko (2):
  net/mlx5: add Tx datapath trace analyzing script
  doc: add mlx5 datapath tracing feature description

 doc/guides/nics/mlx5.rst             |  74 +++++++
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 2 files changed, 381 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 1/2] net/mlx5: add Tx datapath trace analyzing script
  2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
@ 2023-07-11 15:15   ` Viacheslav Ovsiienko
  2023-07-11 15:15   ` [PATCH v6 2/2] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
  2023-07-27 10:52   ` [PATCH v6 0/2] net/mlx5: introduce Tx datapath tracing Thomas Monjalon
  2 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-11 15:15 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The Python script is intended to analyze mlx5 PMD
datapath traces and report:
  - tx_burst routine timings
  - how packets are pushed to WQEs
  - how packet sending is completed with timings

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/tools/mlx5_trace.py | 307 +++++++++++++++++++++++++++
 1 file changed, 307 insertions(+)
 create mode 100755 drivers/net/mlx5/tools/mlx5_trace.py

diff --git a/drivers/net/mlx5/tools/mlx5_trace.py b/drivers/net/mlx5/tools/mlx5_trace.py
new file mode 100755
index 0000000000..8c1fd0a350
--- /dev/null
+++ b/drivers/net/mlx5/tools/mlx5_trace.py
@@ -0,0 +1,307 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2023 NVIDIA Corporation & Affiliates
+
+"""
+Analyzing the mlx5 PMD datapath tracings
+"""
+import sys
+import argparse
+import bt2
+
+PFX_TX = "pmd.net.mlx5.tx."
+PFX_TX_LEN = len(PFX_TX)
+
+
+class MlxQueue:
+    """Queue container object"""
+
+    def __init__(self):
+        self.done_burst = []  # completed bursts
+        self.wait_burst = []  # waiting for completion
+        self.pq_id = 0
+
+    def log(self):
+        """Log all queue bursts"""
+        for txb in self.done_burst:
+            txb.log()
+
+
+class MlxMbuf:
+    """Packet mbufs container object"""
+
+    def __init__(self):
+        self.wqe = 0     # wqe id
+        self.ptr = None  # first packet mbuf pointer
+        self.len = 0     # packet data length
+        self.nseg = 0    # number of segments
+
+    def log(self):
+        """Log mbuf"""
+        out_txt = "    %X: %u" % (self.ptr, self.len)
+        if self.nseg != 1:
+            out_txt += " (%d segs)" % self.nseg
+        print(out_txt)
+
+
+class MlxWqe:
+    """WQE container object"""
+
+    def __init__(self):
+        self.mbuf = []    # list of mbufs in WQE
+        self.wait_ts = 0  # preceding wait/push timestamp
+        self.comp_ts = 0  # send/recv completion timestamp
+        self.opcode = 0
+
+    def log(self):
+        """Log WQE"""
+        wqe_id = (self.opcode >> 8) & 0xFFFF
+        wqe_op = self.opcode & 0xFF
+        out_txt = "  %04X: " % wqe_id
+        if wqe_op == 0xF:
+            out_txt += "WAIT"
+        elif wqe_op == 0x29:
+            out_txt += "EMPW"
+        elif wqe_op == 0xE:
+            out_txt += "TSO "
+        elif wqe_op == 0xA:
+            out_txt += "SEND"
+        else:
+            out_txt += "0x%02X" % wqe_op
+        if self.comp_ts != 0:
+            out_txt += " (%d, %d)" % (self.wait_ts, self.comp_ts - self.wait_ts)
+        else:
+            out_txt += " (%d)" % self.wait_ts
+        print(out_txt)
+        for mbuf in self.mbuf:
+            mbuf.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if WQE in not completedLog WQE"""
+        if self.comp_ts != 0:
+            return 1
+        cur_id = (self.opcode >> 8) & 0xFFFF
+        if cur_id > wqe_id:
+            cur_id -= wqe_id
+            if cur_id <= 0x8000:
+                return 0
+        else:
+            cur_id = wqe_id - cur_id
+            if cur_id >= 0x8000:
+                return 0
+        self.comp_ts = wqe_ts
+        return 1
+
+
+class MlxBurst:
+    """Packet burst container object"""
+
+    def __init__(self):
+        self.wqes = []    # issued burst WQEs
+        self.done = 0     # number of sent/recv packets
+        self.req = 0      # requested number of packets
+        self.call_ts = 0  # burst routine invocation
+        self.done_ts = 0  # burst routine done
+        self.queue = None
+
+    def log(self):
+        """Log burst"""
+        port = self.queue.pq_id >> 16
+        queue = self.queue.pq_id & 0xFFFF
+        if self.req == 0:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts (incomplete)"
+                % (self.call_ts, port, queue, self.done, self.req)
+            )
+        else:
+            print(
+                "%u: tx(p=%u, q=%u, %u/%u pkts in %u"
+                % (
+                    self.call_ts,
+                    port,
+                    queue,
+                    self.done,
+                    self.req,
+                    self.done_ts - self.call_ts,
+                )
+            )
+        for wqe in self.wqes:
+            wqe.log()
+
+    def comp(self, wqe_id, wqe_ts):
+        """Return 0 if not all of WQEs in burst completed"""
+        wlen = len(self.wqes)
+        if wlen == 0:
+            return 0
+        for wqe in self.wqes:
+            if wqe.comp(wqe_id, wqe_ts) == 0:
+                return 0
+        return 1
+
+
+class MlxTrace:
+    """Trace representing object"""
+
+    def __init__(self):
+        self.tx_blst = {}  # current Tx bursts per CPU
+        self.tx_qlst = {}  # active Tx queues per port/queue
+        self.tx_wlst = {}  # wait timestamp list per CPU
+
+    def run(self, msg_it):
+        """Run over gathered tracing data and build database"""
+        for msg in msg_it:
+            if not isinstance(msg, bt2._EventMessageConst):
+                continue
+            event = msg.event
+            if event.name.startswith(PFX_TX):
+                do_tx(msg, self)
+            # Handling of other log event cathegories can be added here
+
+    def log(self):
+        """Log gathered trace database"""
+        for pq_id in self.tx_qlst:
+            queue = self.tx_qlst.get(pq_id)
+            queue.log()
+
+
+def do_tx_entry(msg, trace):
+    """Handle entry Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is not None:
+        # continue existing burst after WAIT
+        return
+    # allocate the new burst and append to the queue
+    burst = MlxBurst()
+    burst.call_ts = msg.default_clock_snapshot.ns_from_origin
+    trace.tx_blst[cpu_id] = burst
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        # queue does not exist - allocate the new one
+        queue = MlxQueue()
+        queue.pq_id = pq_id
+        trace.tx_qlst[pq_id] = queue
+    burst.queue = queue
+    queue.wait_burst.append(burst)
+
+
+def do_tx_exit(msg, trace):
+    """Handle exit Tx busrt"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    burst.done_ts = msg.default_clock_snapshot.ns_from_origin
+    burst.req = event["nb_req"]
+    burst.done = event["nb_sent"]
+    trace.tx_blst.pop(cpu_id)
+
+
+def do_tx_wqe(msg, trace):
+    """Handle WQE record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    wqe = MlxWqe()
+    wqe.wait_ts = trace.tx_wlst.get(cpu_id)
+    if wqe.wait_ts is None:
+        wqe.wait_ts = msg.default_clock_snapshot.ns_from_origin
+    wqe.opcode = event["opcode"]
+    burst.wqes.append(wqe)
+
+
+def do_tx_wait(msg, trace):
+    """Handle WAIT record"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    trace.tx_wlst[cpu_id] = event["ts"]
+
+
+def do_tx_push(msg, trace):
+    """Handle WQE push event"""
+    event = msg.event
+    cpu_id = event["cpu_id"]
+    burst = trace.tx_blst.get(cpu_id)
+    if burst is None:
+        return
+    if not burst.wqes:
+        return
+    wqe = burst.wqes[-1]
+    mbuf = MlxMbuf()
+    mbuf.wqe = event["wqe_id"]
+    mbuf.ptr = event["mbuf"]
+    mbuf.len = event["mbuf_pkt_len"]
+    mbuf.nseg = event["mbuf_nb_segs"]
+    wqe.mbuf.append(mbuf)
+
+
+def do_tx_complete(msg, trace):
+    """Handle send completion event"""
+    event = msg.event
+    pq_id = event["port_id"] << 16 | event["queue_id"]
+    queue = trace.tx_qlst.get(pq_id)
+    if queue is None:
+        return
+    qlen = len(queue.wait_burst)
+    if qlen == 0:
+        return
+    wqe_id = event["wqe_id"]
+    wqe_ts = event["ts"]
+    rmv = 0
+    while rmv < qlen:
+        burst = queue.wait_burst[rmv]
+        if burst.comp(wqe_id, wqe_ts) == 0:
+            break
+        rmv += 1
+    # mode completed burst to done list
+    if rmv != 0:
+        idx = 0
+        while idx < rmv:
+            queue.done_burst.append(burst)
+            idx += 1
+        del queue.wait_burst[0:rmv]
+
+
+def do_tx(msg, trace):
+    """Handle Tx related records"""
+    name = msg.event.name[PFX_TX_LEN:]
+    if name == "entry":
+        do_tx_entry(msg, trace)
+    elif name == "exit":
+        do_tx_exit(msg, trace)
+    elif name == "wqe":
+        do_tx_wqe(msg, trace)
+    elif name == "wait":
+        do_tx_wait(msg, trace)
+    elif name == "push":
+        do_tx_push(msg, trace)
+    elif name == "complete":
+        do_tx_complete(msg, trace)
+    else:
+        print("Error: unrecognized Tx event name: %s" % msg.event.name, file=sys.stderr)
+        raise ValueError()
+
+
+def main() -> int:
+    """Script entry point"""
+    try:
+        parser = argparse.ArgumentParser()
+        parser.add_argument("path", nargs=1, type=str, help="input trace folder")
+        args = parser.parse_args()
+
+        mlx_tr = MlxTrace()
+        msg_it = bt2.TraceCollectionMessageIterator(args.path)
+        mlx_tr.run(msg_it)
+        mlx_tr.log()
+        return 0
+    except ValueError:
+        return -1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 2/2] doc: add mlx5 datapath tracing feature description
  2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
  2023-07-11 15:15   ` [PATCH v6 1/2] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
@ 2023-07-11 15:15   ` Viacheslav Ovsiienko
  2023-07-27 10:52   ` [PATCH v6 0/2] net/mlx5: introduce Tx datapath tracing Thomas Monjalon
  2 siblings, 0 replies; 76+ messages in thread
From: Viacheslav Ovsiienko @ 2023-07-11 15:15 UTC (permalink / raw)
  To: dev; +Cc: jerinj, rasland

The mlx5 provides the send scheduling on specific moment of time,
and for the related kind of applications it would be extremely useful
to have extra debug information - when and how packets were scheduled
and when the actual sending was completed by the NIC hardware (it helps
application to track the internal delay issues).

The patch adds the documentation for feature usage.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 74 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 505873ecfd..a407920555 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1923,6 +1923,80 @@ The procedure below is an example of using a ConnectX-5 adapter card (pf0) with
 
    $ echo "0000:82:00.2" >> /sys/bus/pci/drivers/mlx5_core/bind
 
+How to trace Tx datapath
+------------------------
+
+The mlx5 PMD provides Tx datapath tracing capability with extra debug information:
+when and how packets were scheduled
+and when the actual sending was completed by the NIC hardware.
+
+Steps to enable Tx datapath tracing:
+
+#. Build DPDK application with enabled datapath tracing
+
+   The Meson option ``--enable_trace_fp=true`` and
+   the C flag ``ALLOW_EXPERIMENTAL_API`` should be specified.
+
+   .. code-block:: console
+
+      meson configure --buildtype=debug -Denable_trace_fp=true
+         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
+
+#. Configure the NIC
+
+   If the sending completion timings are important,
+   the NIC should be configured to provide realtime timestamps.
+   The non-volatile settings parameter  ``REAL_TIME_CLOCK_ENABLE`` should be configured as one.
+   The ``mlxconfig`` utility is part of the MFT package.
+
+   .. code-block:: console
+
+      mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1
+
+#. Run application with EAL parameter enabling the tracing in mlx5 Tx datapath
+
+   By default all tracepoints are disabled.
+   To analyze Tx datapath and its timings: ``--trace=pmd.net.mlx5.tx``.
+
+#. Commit the tracing data to the storage (with ``rte_trace_save()`` API call).
+
+#. Install or build the ``babeltrace2`` package
+
+   The Python script analyzing gathered trace data uses the ``babeltrace2`` library.
+   The package should be either installed or built from source as shown below.
+
+   .. code-block:: console
+
+      git clone https://github.com/efficios/babeltrace.git
+      cd babeltrace
+      ./bootstrap
+      ./configure -help
+      ./configure --disable-api-doc --disable-man-pages
+                  --disable-python-bindings-doc --enable-python-plugins
+                  --enable-python-binding
+
+#. Run analyzing script
+
+   ``mlx5_trace.py`` is used to combine related events (packet firing and completion)
+   and to show the results in human-readable view.
+
+   The analyzing script is located in the DPDK source tree: ``drivers/net/mlx5/tools``.
+
+   It requires Python 3.6, ``babeltrace2`` package.
+
+   The parameter of the script is the trace data folder.
+
+   .. code-block:: console
+
+      mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
+
+#. Interpreting the script output data
+
+   All the timings are given in nanoseconds.
+   The list of Tx bursts per port/queue is presented in the output.
+   Each list element contains the list of built WQEs with specific opcodes.
+   Each WQE contains the list of the encompassed packets to send.
+
 Host shaper
 -----------
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 0/2] net/mlx5: introduce Tx datapath tracing
  2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
  2023-07-11 15:15   ` [PATCH v6 1/2] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
  2023-07-11 15:15   ` [PATCH v6 2/2] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
@ 2023-07-27 10:52   ` Thomas Monjalon
  2 siblings, 0 replies; 76+ messages in thread
From: Thomas Monjalon @ 2023-07-27 10:52 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, jerinj, rasland

> Viacheslav Ovsiienko (2):
>   net/mlx5: add Tx datapath trace analyzing script
>   doc: add mlx5 datapath tracing feature description

That's only a Python script and its doc, so it's OK to add just before the release.

Applied, thanks.



^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2023-07-27 10:52 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-20 10:07 [RFC 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-04-20 10:07 ` [RFC 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
2023-04-20 10:13   ` Jerin Jacob
2023-04-20 10:08 ` [RFC 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-04-20 10:11   ` Jerin Jacob
2023-06-13 15:50     ` Slava Ovsiienko
2023-06-13 15:53       ` Jerin Jacob
2023-06-13 15:59         ` Slava Ovsiienko
2023-06-13 16:01           ` Jerin Jacob
2023-06-27  0:39             ` Thomas Monjalon
2023-06-27  6:15               ` Slava Ovsiienko
2023-06-27  7:28                 ` Thomas Monjalon
2023-06-27  8:19                   ` Slava Ovsiienko
2023-06-27  9:33                     ` Thomas Monjalon
2023-06-27  9:43                       ` Slava Ovsiienko
2023-06-27 11:36                         ` Thomas Monjalon
2023-04-20 10:08 ` [RFC 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
2023-04-20 10:08 ` [RFC 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-04-20 10:08 ` [RFC 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-06-09 15:28 ` [PATCH 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-06-09 15:28   ` [PATCH 1/5] app/testpmd: add trace dump command Viacheslav Ovsiienko
2023-06-09 15:28   ` [PATCH 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-06-09 15:28   ` [PATCH 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
2023-06-09 15:28   ` [PATCH 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-06-09 15:28   ` [PATCH 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-06-13 16:58 ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-06-13 16:58   ` [PATCH v2 1/5] app/testpmd: add trace save command Viacheslav Ovsiienko
2023-06-21 11:15     ` Ferruh Yigit
2023-06-23  8:00       ` Slava Ovsiienko
2023-06-23 11:52         ` Ferruh Yigit
2023-06-23 12:03           ` Jerin Jacob
2023-06-23 12:14             ` Slava Ovsiienko
2023-06-23 12:23             ` Ferruh Yigit
2023-06-13 16:58   ` [PATCH v2 2/5] common/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-06-13 16:58   ` [PATCH v2 3/5] net/mlx5: add Tx datapath tracing Viacheslav Ovsiienko
2023-06-13 16:58   ` [PATCH v2 4/5] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-06-13 16:58   ` [PATCH v2 5/5] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-06-20 12:00   ` [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
2023-06-27  0:46     ` Thomas Monjalon
2023-06-27 11:24       ` Slava Ovsiienko
2023-06-27 11:34         ` Thomas Monjalon
2023-06-28 14:18           ` Robin Jarry
2023-06-29  7:16             ` Slava Ovsiienko
2023-06-29  9:08               ` Robin Jarry
2023-06-26 11:06 ` [PATCH] app/testpmd: add trace dump command Viacheslav Ovsiienko
2023-06-26 11:07 ` [PATCH v3] " Viacheslav Ovsiienko
2023-06-26 11:57 ` [PATCH v4] " Viacheslav Ovsiienko
2023-06-27 11:34   ` Ferruh Yigit
2023-06-27 11:39     ` Slava Ovsiienko
2023-06-27 11:58       ` Ferruh Yigit
2023-06-27 14:44     ` [PATCH] app/testpmd: add dump command help message Viacheslav Ovsiienko
2023-06-27 18:03       ` Ferruh Yigit
2023-06-28  9:54         ` [PATCH v2] " Viacheslav Ovsiienko
2023-06-28 13:18           ` Ferruh Yigit
2023-06-27 13:09 ` [PATCH v5] app/testpmd: add trace dump command Viacheslav Ovsiienko
2023-06-27 15:18   ` Ferruh Yigit
2023-06-28 11:09 ` [PATCH v3 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-06-28 11:09   ` [PATCH v3 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-06-28 11:09   ` [PATCH v3 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-06-28 11:09   ` [PATCH v3 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-06-28 11:09   ` [PATCH v3 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
2023-07-05 11:10 ` [PATCH v4 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-07-05 11:10   ` [PATCH v4 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-07-05 11:10   ` [PATCH v4 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-07-05 11:10   ` [PATCH v4 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-07-05 11:10   ` [PATCH v4 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
2023-07-05 15:31 ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Viacheslav Ovsiienko
2023-07-05 15:31   ` [PATCH v5 1/4] net/mlx5: introduce tracepoints for mlx5 drivers Viacheslav Ovsiienko
2023-07-05 15:31   ` [PATCH v5 2/4] net/mlx5: add comprehensive send completion trace Viacheslav Ovsiienko
2023-07-05 15:31   ` [PATCH v5 3/4] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-07-05 15:31   ` [PATCH v5 4/4] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
2023-07-06 16:27   ` [PATCH v5 0/4] net/mlx5: introduce Tx datapath tracing Raslan Darawsheh
2023-07-11 15:15 ` [PATCH v6 0/2] " Viacheslav Ovsiienko
2023-07-11 15:15   ` [PATCH v6 1/2] net/mlx5: add Tx datapath trace analyzing script Viacheslav Ovsiienko
2023-07-11 15:15   ` [PATCH v6 2/2] doc: add mlx5 datapath tracing feature description Viacheslav Ovsiienko
2023-07-27 10:52   ` [PATCH v6 0/2] net/mlx5: introduce Tx datapath tracing Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).