From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B04EA459AC; Mon, 16 Sep 2024 10:12:30 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3E66F4025F; Mon, 16 Sep 2024 10:12:30 +0200 (CEST) Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by mails.dpdk.org (Postfix) with ESMTP id 62FDE40041 for ; Mon, 16 Sep 2024 10:12:29 +0200 (CEST) Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7a99d23e036so458545385a.1 for ; Mon, 16 Sep 2024 01:12:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726474348; x=1727079148; darn=dpdk.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Kh3px0nW2prClfohYsz1k7FcMerS7H9wRfKW2FYlSMI=; b=VaH44AY7f84Ywo3S7SRRBXL848x1PlpMfgE6+K+3cjrNhwKjClFjwm8NKzhVuIqk1D scek5zKn/ch1CRkAAjTuH51O3K4c6XnR7syYI/g1KRWIpku84W14zni9sX7bEbfyoI07 M7rvv96lSOwIOaVL/xZUjPcQKF5tCsCsdBd+wBpTJp7J7jDmpyzF2se/ko2yLenBMuCc eVYbf8dTud3DUYEh8XOedwvBCtctePuFLPCiyZAY/AJzWO4ZxwGc5/tXvKmLNxw9aBRZ Fh9vThufweFUJpFyPPVNCnVLYF0dZI0ZQY2/SCreNttp3ELQFTHDtRzf/dp8LwMNSZRs jtPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726474348; x=1727079148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kh3px0nW2prClfohYsz1k7FcMerS7H9wRfKW2FYlSMI=; b=Tt9Yiq4TpM2aHRl4WVlHBCYjMZoWpGjZ1sK69IwAWD4TYc5AcMwp1cnLSL0sxx/UBV IpF/pH1SeuM3y/UQklddUJOfqUxfI95SyX0bNwZSJCGZVVkZIeofNPaRbG478cLmqoom /S0onH94/r0nZFWaljAKG6htEs/F68SvNsPbrC+gxSkKdTo2fG8apIWVjaz7xokvi8At jja4NCoVLaXKC9ka+/9wZLe5gQa2LwGoeTXjLJrJL0MUJB6939sarsijRwhuHLyH4V9o tuZGVxFcxfxO1t7WgfEtxZ6nw82N/sSkqwsCcuApgCfKlaP3rjRi6+4ofLkk4Ke1xut1 JDhQ== X-Forwarded-Encrypted: i=1; AJvYcCWdL3oBP55nZs6itl2uTcIm5FIFU5R9SZ6yvuN4isgF+gViSy6XWyGwtfNl4Jy6LbHkFuE=@dpdk.org X-Gm-Message-State: AOJu0YwdodbdEHRtoXXGm1SmRGY3wEVy9OF5HHSCHSbGR3j2tXN2m/b7 FhL11BWmW8hFukW+vclDs0nB+mgrCDdYJVQqJICAaKRXC/ZZwQiIGYhXzIE+VUBAxH200A0p8sZ s/9gcFSS4gG8lR2UL9A39rVG68y0= X-Google-Smtp-Source: AGHT+IGF8L3ffxVrN1Ppk/3vr7JT4SS64Jz8kWrjb2pEUXCoiFFhlVfKar5JN67lHKTaVJV6XsuH2hSCupjmajOrs1M= X-Received: by 2002:a05:620a:471f:b0:7a9:ab71:f820 with SMTP id af79cd13be357-7a9e5ee011dmr2248605085a.4.1726474348546; Mon, 16 Sep 2024 01:12:28 -0700 (PDT) MIME-Version: 1.0 References: <20240911170430.701685-2-mattias.ronnblom@ericsson.com> <20240912084429.703405-1-mattias.ronnblom@ericsson.com> <20240912084429.703405-4-mattias.ronnblom@ericsson.com> <88a778d3-e157-41cd-9da7-2d06864a654d@lysator.liu.se> <0a8dd454-976c-4f17-a870-09ba2d90c717@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35E9F6E2@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F6E2@smartserver.smartshare.dk> From: Jerin Jacob Date: Mon, 16 Sep 2024 13:42:02 +0530 Message-ID: Subject: Re: [PATCH v3 3/7] eal: add lcore variable performance test To: =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org, Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Sep 13, 2024 at 8:10=E2=80=AFPM Morten Br=C3=B8rup wrote: > > > From: Jerin Jacob [mailto:jerinjacobk@gmail.com] > > Sent: Friday, 13 September 2024 13.24 > > > > On Fri, Sep 13, 2024 at 12:17=E2=80=AFPM Mattias R=C3=B6nnblom > > wrote: > > > > > > On 2024-09-12 17:11, Jerin Jacob wrote: > > > > On Thu, Sep 12, 2024 at 6:50=E2=80=AFPM Mattias R=C3=B6nnblom > > wrote: > > > >> > > > >> On 2024-09-12 15:09, Jerin Jacob wrote: > > > >>> On Thu, Sep 12, 2024 at 2:34=E2=80=AFPM Mattias R=C3=B6nnblom > > > >>> wrote: > > > >>>> +static double > > > >>>> +benchmark_access_method(void (*init_fun)(void), void > > (*update_fun)(void)) > > > >>>> +{ > > > >>>> + uint64_t i; > > > >>>> + uint64_t start; > > > >>>> + uint64_t end; > > > >>>> + double latency; > > > >>>> + > > > >>>> + init_fun(); > > > >>>> + > > > >>>> + start =3D rte_get_timer_cycles(); > > > >>>> + > > > >>>> + for (i =3D 0; i < ITERATIONS; i++) > > > >>>> + update_fun(); > > > >>>> + > > > >>>> + end =3D rte_get_timer_cycles(); > > > >>> > > > >>> Use precise variant. rte_rdtsc_precise() or so to be accurate > > > >> > > > >> With 1e7 iterations, do you need rte_rdtsc_precise()? I suspect no= t. > > > > > > > > I was thinking in another way, with 1e7 iteration, the additional > > > > barrier on precise will be amortized, and we get more _deterministi= c_ > > > > behavior e.s.p in case if we print cycles and if we need to catch > > > > regressions. > > > > > > If you time a section of code which spends ~40000000 cycles, it doesn= 't > > > matter if you add or remove a few cycles at the beginning and the end= . > > > > > > The rte_rdtsc_precise() is both better (more precise in the sense of > > > more serialization), and worse (because it's more costly, and thus mo= re > > > intrusive). > > > > We can calibrate the overhead to remove the cost. > > > > > > > > You can use rte_rdtsc_precise(), rte_rdtsc(), or gettimeofday(). It > > > doesn't matter. > > > > Yes. In this setup and it is pretty inaccurate PER iteration. Please > > refer to the below patch to see the difference. > > No, Mattias is right. The time is sampled once before the loop, then the = function is executed 10 million (ITERATIONS) times in the loop, and then th= e time is sampled once again. No. I am not disagreeing. That why I said, =E2=80=9CYes. In this setup=E2= =80=9D. All I am saying, there is a more accurate way of doing measurement for this test along with =E2=80=9Cdata=E2=80=9D at https://mails.dpdk.org/archives/dev/2024-September/301227.html > > So the overhead and accuracy of the timing function is amortized across t= he 10 million calls to the function being measured, and becomes insignifica= nt. > > Other perf tests also do it this way, and also use rte_get_timer_cycles()= . E.g. the mempool_perf test. > > Another detail: The for loop itself may cost a few cycles, which may not = be irrelevant when measuring a function using very few cycles. If the compi= ler doesn't unroll the loop, it should be done manually: > > for (i =3D 0; i < ITERATIONS / 100; i++) { > update_fun(); > update_fun(); > ... repeated 100 times I have done a similar scheme for trace perf for inline function test at https://github.com/DPDK/dpdk/blob/main/app/test/test_trace_perf.c#L30 Either the above scheme or the below scheme needs to be used as mentioned in https://mails.dpdk.org/archives/dev/2024-September/301227.html + for (i =3D 0; i < ITERATIONS; i++) { + start =3D rte_rdtsc_precise(); update_fun(); + end =3D rte_rdtsc_precise(); + latency +=3D (end - start) - tsc_latency; + } > } > > > > > > Patch 1: Make nanoseconds to cycles per iteration > > ------------------------------------------------------------------ > > > > diff --git a/app/test/test_lcore_var_perf.c b/app/test/test_lcore_var_p= erf.c > > index ea1d7ba90b52..b8d25400f593 100644 > > --- a/app/test/test_lcore_var_perf.c > > +++ b/app/test/test_lcore_var_perf.c > > @@ -110,7 +110,7 @@ benchmark_access_method(void (*init_fun)(void), > > void (*update_fun)(void)) > > > > end =3D rte_get_timer_cycles(); > > > > - latency =3D ((end - start) / (double)rte_get_timer_hz()) / ITER= ATIONS; > > + latency =3D ((end - start)) / ITERATIONS; > > This calculation uses integer arithmetic, which will round down the resul= ting latency. > Please use floating point arithmetic: latency =3D (end - start) / (double= )ITERATIONS; Yup. It is in patch 2 https://mails.dpdk.org/archives/dev/2024-September/301227.html > > > > > return latency; > > } > > @@ -137,8 +137,7 @@ test_lcore_var_access(void) > > > > - printf("Latencies [ns/update]\n"); > > + printf("Latencies [cycles/update]\n"); > > printf("Thread-local storage Static array Lcore variables\n")= ; > > - printf("%20.1f %13.1f %16.1f\n", tls_latency * 1e9, > > - sarray_latency * 1e9, lvar_latency * 1e9); > > + printf("%20.1f %13.1f %16.1f\n", tls_latency, sarray_latency, > > lvar_latency); > > > > return TEST_SUCCESS; > > }