From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f174.google.com (mail-pd0-f174.google.com [209.85.192.174]) by dpdk.org (Postfix) with ESMTP id C8B825A55 for ; Mon, 20 Apr 2015 17:37:51 +0200 (CEST) Received: by pdbqd1 with SMTP id qd1so211343677pdb.2 for ; Mon, 20 Apr 2015 08:37:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=LOQOgJ/36T1qPUZdqotkIzCTx3dBYx1pi6wcpdKxhHI=; b=MWSLQKw0b3+5BhgLt5T35+nO9GxjRwLcVh3eZPzFVYhfvzhQ+jmAjKBsIehYLvirJf bLN/HwT4rlsiWPzjpPR4OGRQyTrefgL+G2HkDy+1ouflxzGVCa7mw1QyeUWbCzgoGONs lzsLt6kCU2BmFWe6YgKhLFC3veXFG7U7Oqg/kXVzVsT3LD1v9FBTTP6I/DNUpuTjdKmM prdvWRHC5yzb78A2qvjdNQ/IrPrueOdP6aWqcblfxExGWCPEEyJzdZJ/V6RFszodY7F/ fpGZGJ93dNJzHbS9Rc2QKhDtp6PhObbKjFpfE+54nfIWLel+bW2GYLW/JeTs0J7pfQ7z +eOA== X-Gm-Message-State: ALoCoQk+q9pKAwUZ8+PxfOLNhLTVIXXZCuLnBlet8jdrJPkgQWsYSYt4lyMDGKno0/dDttUApSce X-Received: by 10.70.63.1 with SMTP id c1mr29246039pds.90.1429544270101; Mon, 20 Apr 2015 08:37:50 -0700 (PDT) Received: from urahara (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by mx.google.com with ESMTPSA id fu14sm18695801pad.44.2015.04.20.08.37.49 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Apr 2015 08:37:49 -0700 (PDT) Date: Mon, 20 Apr 2015 08:37:54 -0700 From: Stephen Hemminger To: Ravi Kumar Iyer Message-ID: <20150420083754.5baaf48f@urahara> In-Reply-To: <115e8a38d223487488d22a99f53cc926@GURMBXV03.AD.ARICENT.COM> References: <115e8a38d223487488d22a99f53cc926@GURMBXV03.AD.ARICENT.COM> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] cost of reading tsc register X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Apr 2015 15:37:52 -0000 On Mon, 20 Apr 2015 14:37:53 +0000 Ravi Kumar Iyer wrote: > Hi, > We were doing some code optimizations , running DPDK based applications, and chanced upon the rte_rdtsc function [ to read tsc timestamp register value ] consuming cpu cycles of the order of 100clock cycles with a delta of upto 40cycles at times [ 60-140 cycles] > > We are actually building up a cpu intensive application which is also very clock cycle sensitive and this is impacting our implementation. > > To validate the same using a small/vanilla application we wrote a small code and tested on a single core. > Has anyone else faced a similar issue or are we doing something really atrocious here. > > Below is the pseudo snip of the same: > > > > uint64_t g_tsc_cost[8] __rte_cache_aligned; > > void test_tsc_cost() > { > uint8_t i = 0; > for (i = 0; i < 8 ; i++) > { > g_tsc_cost[i] = rte_rdtsc(); > } > } > int > main(int argc, char **argv) > { > > int ret; > unsigned lcore_id; > > ret = rte_eal_init(argc, argv); > if (ret < 0) > rte_panic("Cannot init EAL\n"); > > memset(g_tsc_cost,0,64); /* warm the cache */ > > uint64_t sc = rte_rdtsc(); /* start count */ > test_tsc_cost(); > uint64_t ec = rte_rdtsc(); /* end count */ > > printf("\n Total cost = %lu\n",(ec-sc)); > > uint8_t i = 0; > > for (i = 0; i < 8 ; i++) > { > printf("\n g_tsc_cost[%d]=%lu",i,g_tsc_cost[i]); > /* here the values printed are 60-140 units apart */ > > } > return 0; > } > > > Just to compare, On few bare metal implementations of non-intel processors, we are seeing the similar code print values with a delta of 3-4 cycles and thus its becoming a bit difficult to digest as well. Grateful for any help/guidance here. TSC instruction has it's quirks. As far as I can tel. 1. It kills instruction pipelining 2. It is as expensive as a cache miss 3. counter values are not stable on some CPU's In general, it is best to avoid getting dependent on it in real code. Intel seems to only test on current generation Intel CPU's in their lab and on bare metal. Don't read too much into the demo applications. To get reasonable performance, I gave up on TSC and used approximate loop cycles for tuning.