DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Stephen Hemminger <shemming@brocade.com>
Subject: Re: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in	calculating byte clock
Date: Tue, 17 Feb 2015 11:05:00 -0500
Message-ID: <20150217110500.41ed8a18@uryu.home.lan> (raw)
In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D8912632318070@IRSMSX108.ger.corp.intel.com>

On Mon, 16 Feb 2015 22:44:31 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> Hi Stephen,
> 
> Sorry, NACK.
> 
> 1. Overflow issue
> As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the line of code below results in overflow:
> 	port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / params->rate;
> Therefore, there is most likely a significant accuracy loss, which might result in more packets allowed to go out than it should.

The tsc shifted is still 64 bits.
and rate is 32 bits bytes/sec.

I chose scale such that
if clock = 3 Ghz
then min rate = 715 bytes/sec =  5722 bits/sec

> 2. Integer division has a higher cost than floating point division
> My understanding is we are considering a performance improvement by replacing the double precision floating point division in:
> 	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
> with an integer division:
> 	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / port->cycles_per_byte;
> I don't think this is going to have the claimed benefit, as acording to "Intel 64 and IA-32 Architectures Optimization  Reference Manual" (Appendix C), the latency of the integer division instruction is significantly bigger than the latency of integer division:
> 	Instruction FDIV double precision: latency = 38-40 cycles
> 	Instruction IDIV: latency = 56 - 80 cycles

I observed that performance when from 5Gbit/sec to 10Gbit/sec.
Mostly because the floating point engages more instruction units and does not
pipeline. Cycle count is not everything.  This was on Ivy Bridge processor.


> 3. Alternative
> I hear though your suggestion about replacing the floating point division with a more performant construction. One suggestion would be to replace it with an integer multiplication followed by a shift right, probably by using a uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to prototype this code myself. Would you be OK to look into providing an alternative implementation?
>

I looked into multiplative integer method, and will do it in future. But it has
more scaling issues since it would require that the values both be 32 bits.

  reply	other threads:[~2015-02-17 16:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05  6:13 [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read Stephen Hemminger
2015-02-05  6:13 ` [dpdk-dev] [PATCH v2 5/7] rte_sched: don't put tabs in log messages Stephen Hemminger
2015-02-20 18:33   ` Dumitrescu, Cristian
2015-02-05  6:14 ` [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
2015-02-16 22:44   ` Dumitrescu, Cristian
2015-02-17 16:05     ` Stephen Hemminger [this message]
2015-02-05  6:14 ` [dpdk-dev] [PATCH v2 7/7] rte_sched: rearrange data structures Stephen Hemminger
2015-02-20 18:43   ` Dumitrescu, Cristian
2015-02-05 12:43 ` [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read Neil Horman
2015-02-23 23:51   ` Thomas Monjalon
     [not found]   ` <fa17ab0c3bc041b88e18d3d76a255f13@HQ1WP-EXMB11.corp.brocade.com>
2015-02-24 19:18     ` Stephen Hemminger
2015-02-24 20:06       ` Thomas Monjalon
2015-02-25 17:29         ` Dumitrescu, Cristian
2015-03-10 13:55         ` Thomas Monjalon
2015-02-09 22:48 ` Dumitrescu, Cristian
2015-02-09 22:55   ` Stephen Hemminger
2015-02-20 18:32     ` Dumitrescu, Cristian
2015-02-20 19:52       ` Stephen Hemminger
2015-02-20 20:23         ` Dumitrescu, Cristian
2015-02-20 21:01           ` Thomas Monjalon
2015-02-20 21:28             ` Dumitrescu, Cristian
2015-02-21  1:53               ` Stephen Hemminger
2015-02-23 12:06                 ` Dumitrescu, Cristian
2015-02-09 23:46   ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150217110500.41ed8a18@uryu.home.lan \
    --to=stephen@networkplumber.org \
    --cc=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=shemming@brocade.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git