DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Ravi Kumar Iyer <Ravi.Iyer@aricent.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] cost of reading tsc register
Date: Mon, 20 Apr 2015 08:37:54 -0700	[thread overview]
Message-ID: <20150420083754.5baaf48f@urahara> (raw)
In-Reply-To: <115e8a38d223487488d22a99f53cc926@GURMBXV03.AD.ARICENT.COM>

On Mon, 20 Apr 2015 14:37:53 +0000
Ravi Kumar Iyer <Ravi.Iyer@aricent.com> wrote:

> Hi,
> We were doing some code optimizations , running DPDK based applications, and chanced upon the rte_rdtsc function [ to read tsc timestamp register value ] consuming cpu cycles of the order of 100clock cycles with a delta of upto 40cycles at times [ 60-140 cycles]
> 
> We are actually building up a cpu intensive application which is also very clock cycle sensitive and this is impacting our implementation.
> 
> To validate the same using a small/vanilla application we wrote a small code and tested on a single core.
> Has anyone else faced a similar issue or are we doing something really atrocious here.
> 
> Below is the pseudo snip of the same:
> 
> 
> <snip start>
> uint64_t g_tsc_cost[8] __rte_cache_aligned;
> 
> void test_tsc_cost()
> {
>     uint8_t i = 0;
>     for (i = 0; i < 8 ; i++)
>     {
>         g_tsc_cost[i] = rte_rdtsc();
>       }
> }
> int
> main(int argc, char **argv)
> {
> 
>     int ret;
>     unsigned lcore_id;
> 
>     ret = rte_eal_init(argc, argv);
>     if (ret < 0)
>         rte_panic("Cannot init EAL\n");
> 
>     memset(g_tsc_cost,0,64); /* warm the cache */
> 
>     uint64_t sc = rte_rdtsc(); /* start count */
>     test_tsc_cost();
>     uint64_t ec = rte_rdtsc(); /* end count */
> 
>     printf("\n Total cost = %lu\n",(ec-sc));
> 
>     uint8_t i = 0;
> 
>     for (i = 0; i < 8 ; i++)
>     {
>         printf("\n g_tsc_cost[%d]=%lu",i,g_tsc_cost[i]);
>        /* here the values printed are 60-140 units apart */
> 
>     }
>     return 0;
> }
> <snip end>
> 
> Just to compare, On few bare metal implementations of non-intel processors, we are seeing the similar code print values with a delta of 3-4 cycles and thus its becoming a bit difficult to digest as well.  Grateful for any help/guidance here.

TSC instruction has it's quirks. As far as I can tel.
 1. It kills instruction pipelining
 2. It is as expensive as a cache miss
 3. counter values are not stable on some CPU's

In general, it is best to avoid getting dependent on it in real code.
Intel seems to only test on current generation Intel CPU's in their
lab and on bare metal. Don't read too much into the demo applications.

To get reasonable performance, I gave up on TSC and used approximate
loop cycles for tuning.

  reply	other threads:[~2015-04-20 15:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 14:37 Ravi Kumar Iyer
2015-04-20 15:37 ` Stephen Hemminger [this message]
2015-04-20 16:21 ` Matthew Hall
2015-04-22  7:53 ` Pawel Wodkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150420083754.5baaf48f@urahara \
    --to=stephen@networkplumber.org \
    --cc=Ravi.Iyer@aricent.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).