DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/2] Latencystat optimization and fix
@ 2025-06-13  0:34 Stephen Hemminger
  2025-06-13  0:34 ` [PATCH 1/2] latencystats: fix receive sample MP issues Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Stephen Hemminger @ 2025-06-13  0:34 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

After investigating latencystat test failures, discovered it poorly
designed to handle bursts and multiple queues.

Stephen Hemminger (2):
  latencystats: fix receive sample MP issues
  latencystats: optimize locking on transmit

 lib/latencystats/rte_latencystats.c | 64 +++++++++++++++++++----------
 1 file changed, 42 insertions(+), 22 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] latencystats: fix receive sample MP issues
  2025-06-13  0:34 [PATCH 0/2] Latencystat optimization and fix Stephen Hemminger
@ 2025-06-13  0:34 ` Stephen Hemminger
  2025-06-13  0:34 ` [PATCH 2/2] latencystats: optimize locking on transmit Stephen Hemminger
  2025-06-13 11:13 ` [PATCH 0/2] Latencystat optimization and fix Varghese, Vipin
  2 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2025-06-13  0:34 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Reshma Pattan, Harry van Haaren, Remy Horton

The receive callback was not safe with multiple queues.
If one receive queue callback decides to take a sample it
needs to add that sample and do atomic update to the previous
TSC sample value. Add a new lock for that.

Also, add code to handle TSC wraparound in comparison.
Perhaps this should move to rte_cycles.h?

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fixes: 5cd3cac9ed22 ("latency: added new library for latency stats")
Cc: stable@dpdk.org
---
 lib/latencystats/rte_latencystats.c | 47 ++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/lib/latencystats/rte_latencystats.c b/lib/latencystats/rte_latencystats.c
index 6873a44a92..181c53dd0e 100644
--- a/lib/latencystats/rte_latencystats.c
+++ b/lib/latencystats/rte_latencystats.c
@@ -45,10 +45,19 @@ timestamp_dynfield(struct rte_mbuf *mbuf)
 			timestamp_dynfield_offset, rte_mbuf_timestamp_t *);
 }
 
+/* Compare two 64 bit timer counter but deal with wraparound correctly. */
+static inline bool tsc_after(uint64_t t0, uint64_t t1)
+{
+	return (int64_t)(t1 - t0) < 0;
+}
+
+#define tsc_before(a, b) tsc_after(b, a)
+
 static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";
 static int latency_stats_index;
+
+static rte_spinlock_t sample_lock = RTE_SPINLOCK_INITIALIZER;
 static uint64_t samp_intvl;
-static uint64_t timer_tsc;
 static uint64_t prev_tsc;
 
 #define LATENCY_AVG_SCALE     4
@@ -147,25 +156,27 @@ add_time_stamps(uint16_t pid __rte_unused,
 		void *user_cb __rte_unused)
 {
 	unsigned int i;
-	uint64_t diff_tsc, now;
+	uint64_t now = rte_rdtsc();
 
-	/*
-	 * For every sample interval,
-	 * time stamp is marked on one received packet.
-	 */
-	now = rte_rdtsc();
-	for (i = 0; i < nb_pkts; i++) {
-		diff_tsc = now - prev_tsc;
-		timer_tsc += diff_tsc;
-
-		if ((pkts[i]->ol_flags & timestamp_dynflag) == 0
-				&& (timer_tsc >= samp_intvl)) {
-			*timestamp_dynfield(pkts[i]) = now;
-			pkts[i]->ol_flags |= timestamp_dynflag;
-			timer_tsc = 0;
+	/* Check without locking */
+	if (likely(tsc_before(now, prev_tsc + samp_intvl)))
+		return nb_pkts;
+
+	/* Try and get sample, skip if sample is being done by other core. */
+	if (likely(rte_spinlock_trylock(&sample_lock))) {
+		for (i = 0; i < nb_pkts; i++) {
+			struct rte_mbuf *m = pkts[i];
+
+			/* skip if already timestamped */
+			if (unlikely(m->ol_flags & timestamp_dynflag))
+				continue;
+
+			m->ol_flags |= timestamp_dynflag;
+			*timestamp_dynfield(m) = now;
+			prev_tsc = now;
+			break;
 		}
-		prev_tsc = now;
-		now = rte_rdtsc();
+		rte_spinlock_unlock(&sample_lock);
 	}
 
 	return nb_pkts;
-- 
2.47.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] latencystats: optimize locking on transmit
  2025-06-13  0:34 [PATCH 0/2] Latencystat optimization and fix Stephen Hemminger
  2025-06-13  0:34 ` [PATCH 1/2] latencystats: fix receive sample MP issues Stephen Hemminger
@ 2025-06-13  0:34 ` Stephen Hemminger
  2025-06-13 11:13 ` [PATCH 0/2] Latencystat optimization and fix Varghese, Vipin
  2 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2025-06-13  0:34 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

If transmit callback is called, and there are no packets
in the burst with timestamp set, then the expensive operations
of locking and accessing TSC can be skipped.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/latencystats/rte_latencystats.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/lib/latencystats/rte_latencystats.c b/lib/latencystats/rte_latencystats.c
index 181c53dd0e..dc415c4682 100644
--- a/lib/latencystats/rte_latencystats.c
+++ b/lib/latencystats/rte_latencystats.c
@@ -190,17 +190,26 @@ calc_latency(uint16_t pid __rte_unused,
 		void *_ __rte_unused)
 {
 	unsigned int i;
-	uint64_t now, latency;
+	uint64_t ts_flags = 0;
 	static uint64_t prev_latency;
 
-	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++)
+		ts_flags |= (pkts[i]->ol_flags & timestamp_dynflag);
 
+	/* no samples in this burst, skip locking */
+	if (likely(ts_flags == 0))
+		return nb_pkts;
+
+	uint64_t now = rte_rdtsc();
 	rte_spinlock_lock(&glob_stats->lock);
 	for (i = 0; i < nb_pkts; i++) {
-		if (!(pkts[i]->ol_flags & timestamp_dynflag))
+		struct rte_mbuf *m = pkts[i];
+		uint64_t latency;
+
+		if (!(m->ol_flags & timestamp_dynflag))
 			continue;
 
-		latency = now - *timestamp_dynfield(pkts[i]);
+		latency = now - *timestamp_dynfield(m);
 
 		if (glob_stats->samples++ == 0) {
 			glob_stats->min_latency = latency;
-- 
2.47.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 0/2] Latencystat optimization and fix
  2025-06-13  0:34 [PATCH 0/2] Latencystat optimization and fix Stephen Hemminger
  2025-06-13  0:34 ` [PATCH 1/2] latencystats: fix receive sample MP issues Stephen Hemminger
  2025-06-13  0:34 ` [PATCH 2/2] latencystats: optimize locking on transmit Stephen Hemminger
@ 2025-06-13 11:13 ` Varghese, Vipin
  2 siblings, 0 replies; 4+ messages in thread
From: Varghese, Vipin @ 2025-06-13 11:13 UTC (permalink / raw)
  To: Stephen Hemminger, dev

[Public]

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, June 13, 2025 6:04 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Subject: [PATCH 0/2] Latencystat optimization and fix
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> After investigating latencystat test failures, discovered it poorly designed to handle
> bursts and multiple queues.

Acked-by: Vipin.Varghese@amd.com
Reviewed-by: Vipin.Varghese@amd.com
Tested-by: Thiyagarajan.P@amd.com

We tested the changes using DPDK-TestPmd, following is the summary of the test shared below

1. before vs after patch with 1Q: min: -10ns, max: -5000ns, avg: -10ns, jitter: +5ns
2. before vs after patch with 4Q: min: -10ns, max: -7000ns, avg: +5ns, jitter: +80ns

Since we see a greater variation with multiple queues compared to single queue, we also collect HW Traffic Generator before vs after.
Values we got are ` min: -50000ns, max: -62000ns, avg: -5300ns, jitter: 6ns `.

NIC: Intel E810 2CQ-DA2 (used 1 100Gbps port), firmware: 4.7, DDP: default
Processor: AMD EPYC 8534P 64-Core Processor
Traffic Rate: 100Gbps

Application CMD:
1. DPDK: ` sudo build/app/dpdk-testpmd -l 11,16,17,18,19 -a 42:00.0 --file-prefix tr01 --proc-type=primary --force-max-simd-bitwidth=512 -- --nb-cores=4 --nb-ports=1 --txq=4 --rxq=4 --txd=2048 --rxd=2048 --burst=64 --rss-udp --latencystats=16  -i -a `
2. Proc-Info: ` sudo build/app/dpdk-proc-info --file-prefix=tr01 --proc-type=secondary -a 42:00.0 -- --metrics `

>
> Stephen Hemminger (2):
>   latencystats: fix receive sample MP issues
>   latencystats: optimize locking on transmit
>
>  lib/latencystats/rte_latencystats.c | 64 +++++++++++++++++++----------
>  1 file changed, 42 insertions(+), 22 deletions(-)
>
> --
> 2.47.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-13 11:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-13  0:34 [PATCH 0/2] Latencystat optimization and fix Stephen Hemminger
2025-06-13  0:34 ` [PATCH 1/2] latencystats: fix receive sample MP issues Stephen Hemminger
2025-06-13  0:34 ` [PATCH 2/2] latencystats: optimize locking on transmit Stephen Hemminger
2025-06-13 11:13 ` [PATCH 0/2] Latencystat optimization and fix Varghese, Vipin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).