patches for DPDK stable branches
 help / color / mirror / Atom feed
From: "Varghese, Vipin" <Vipin.Varghese@amd.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	"dev@dpdk.org" <dev@dpdk.org>,
	David Marchand <david.marchand@redhat.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>
Subject: RE: [PATCH v3 1/2] latencystats: fix receive sample MP issues
Date: Wed, 25 Jun 2025 11:31:43 +0000	[thread overview]
Message-ID: <PH7PR12MB85960D943C91FE608DBA4C2F827BA@PH7PR12MB8596.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20250617150252.814215-2-stephen@networkplumber.org>

[AMD Official Use Only - AMD Internal Distribution Only]

Hi David & Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, June 17, 2025 8:30 PM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; stable@dpdk.org
> Subject: [PATCH v3 1/2] latencystats: fix receive sample MP issues
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> The receive callback was not safe with multiple queues.
> If one receive queue callback decides to take a sample it needs to add that sample
> and do atomic update to the previous TSC sample value. Add a new lock for that.
>
> Optimize the check for when to take sample so that it only needs to lock when likely
> to need a sample.
>
> Also, add code to handle TSC wraparound in comparison.
> Perhaps this should move to rte_cycles.h?
>
> Bugzilla ID: 1723
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Fixes: 5cd3cac9ed22 ("latency: added new library for latency stats")
> Cc: stable@dpdk.org
> ---
>  lib/latencystats/rte_latencystats.c | 55 ++++++++++++++++++-----------
>  1 file changed, 35 insertions(+), 20 deletions(-)
>
> diff --git a/lib/latencystats/rte_latencystats.c b/lib/latencystats/rte_latencystats.c
> index 6873a44a92..72a58d78d1 100644
> --- a/lib/latencystats/rte_latencystats.c
> +++ b/lib/latencystats/rte_latencystats.c
> @@ -22,6 +22,7 @@
>  #include <rte_metrics.h>
>  #include <rte_spinlock.h>
>  #include <rte_string_fns.h>
> +#include <rte_stdatomic.h>
>
>  #include "rte_latencystats.h"
>
> @@ -45,11 +46,20 @@ timestamp_dynfield(struct rte_mbuf *mbuf)
>                         timestamp_dynfield_offset, rte_mbuf_timestamp_t *);  }
>
> +/* Compare two 64 bit timer counter but deal with wraparound correctly.
> +*/ static inline bool tsc_after(uint64_t t0, uint64_t t1) {
> +       return (int64_t)(t1 - t0) < 0;
> +}
> +
> +#define tsc_before(a, b) tsc_after(b, a)
> +
>  static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";  static int
> latency_stats_index;
> +
> +static rte_spinlock_t sample_lock = RTE_SPINLOCK_INITIALIZER;
>  static uint64_t samp_intvl;
> -static uint64_t timer_tsc;
> -static uint64_t prev_tsc;
> +static RTE_ATOMIC(uint64_t) next_tsc;
>
>  #define LATENCY_AVG_SCALE     4
>  #define LATENCY_JITTER_SCALE 16
> @@ -147,25 +157,29 @@ add_time_stamps(uint16_t pid __rte_unused,
>                 void *user_cb __rte_unused)  {
>         unsigned int i;
> -       uint64_t diff_tsc, now;
> -
> -       /*
> -        * For every sample interval,
> -        * time stamp is marked on one received packet.
> -        */
> -       now = rte_rdtsc();
> -       for (i = 0; i < nb_pkts; i++) {
> -               diff_tsc = now - prev_tsc;
> -               timer_tsc += diff_tsc;
> -
> -               if ((pkts[i]->ol_flags & timestamp_dynflag) == 0
> -                               && (timer_tsc >= samp_intvl)) {
> -                       *timestamp_dynfield(pkts[i]) = now;
> -                       pkts[i]->ol_flags |= timestamp_dynflag;
> -                       timer_tsc = 0;
> +       uint64_t now = rte_rdtsc();
> +
> +       /* Check without locking */
> +       if (likely(tsc_before(now, rte_atomic_load_explicit(&next_tsc,
> +                                                           rte_memory_order_relaxed))))
> +               return nb_pkts;
> +
> +       /* Try and get sample, skip if sample is being done by other core. */
> +       if (likely(rte_spinlock_trylock(&sample_lock))) {
> +               for (i = 0; i < nb_pkts; i++) {
> +                       struct rte_mbuf *m = pkts[i];
> +
> +                       /* skip if already timestamped */
> +                       if (unlikely(m->ol_flags & timestamp_dynflag))
> +                               continue;
> +
> +                       m->ol_flags |= timestamp_dynflag;
> +                       *timestamp_dynfield(m) = now;
> +                       rte_atomic_store_explicit(&next_tsc, now + samp_intvl,
> +                                                 rte_memory_order_relaxed);
> +                       break;
>                 }
> -               prev_tsc = now;
> -               now = rte_rdtsc();
> +               rte_spinlock_unlock(&sample_lock);
>         }
>
>         return nb_pkts;
> @@ -270,6 +284,7 @@ rte_latencystats_init(uint64_t app_samp_intvl,
>         glob_stats = mz->addr;
>         rte_spinlock_init(&glob_stats->lock);
>         samp_intvl = (uint64_t)(app_samp_intvl * cycles_per_ns);
> +       next_tsc = rte_rdtsc();
>
>         /** Register latency stats with stats library */
>         for (i = 0; i < NUM_LATENCY_STATS; i++)


Application: testpmd io mode with latency-stats enabled
CPU: AMD EPYC 7713 64-Core Processor (AVX2) Huge page: 1GB pages * 32

Nic: Intel E810 1CQ DA2, 1 * 100Gbps
+++++++++++++++++++++++++++++

Firmware: 3.20 0x8000d83e 1.3146.0
DDP: comms package 1.3.53

With no args, Before patch (min, max, avg, jitter)
- 1Q: 30ns, 27432ns, 94ns, 19
- 4Q: 30ns, 27722ns, 95ns, 20

With no args, After Patch (min, max, avg, jitter)
 - 1Q: 40ns, 19136ns, 47ns, 5
 - 4Q: 10ns, 18334ns, 194ns, 64

With args: rx_low_latency=1, Before patch (min, max, avg, jitter)
- 1Q: 30ns, 27432ns, 94ns, 19
- 4Q: 30ns, 27722ns, 95ns, 20

With args: rx_low_latency=1, After Patch
 - 1Q: 40ns, 21631ns, 74ns, 12
 - 4Q: 10ns, 23725ns, 116ns, 112

With Solar flare NIC:
+++++++++++++++

throughput profile; After Patch (min, max, avg, jitter)
 - 1Q: 10ns, 23115ns, 96ns, 65
 - 4Q: 10ns, 2981ns, 136ns, 140

low-latency profile , After Patch
 - 1Q: 10ns, 19399ns, 367ns, 238
 - 4Q: 10ns, 19970ns, 127ns, 100


Following are our understanding

1. increase in multi-queue latency is attributed by spinlock.
2. the lower latency with patch for multi-queue is because the lowest of all queues are taken into account.

Question: will there be per queue min, max, avg stats be enhanced in future?

Tested-by: Thiyagarajan P <Thiyagarajan.P@amd.com>
Reviewed-by: Vipin Varghese <Vipin.Varghese@amd.com>


> --
> 2.47.2


      reply	other threads:[~2025-06-25 11:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250613003547.39239-1-stephen@networkplumber.org>
2025-06-13  0:34 ` [PATCH " Stephen Hemminger
     [not found] ` <20250616160718.49938-1-stephen@networkplumber.org>
2025-06-16 16:04   ` [PATCH v2 " Stephen Hemminger
     [not found] ` <20250617150252.814215-1-stephen@networkplumber.org>
2025-06-17 15:00   ` [PATCH v3 " Stephen Hemminger
2025-06-25 11:31     ` Varghese, Vipin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH7PR12MB85960D943C91FE608DBA4C2F827BA@PH7PR12MB8596.namprd12.prod.outlook.com \
    --to=vipin.varghese@amd.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=stable@dpdk.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).