DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: Stephen Hemminger <shemming@brocade.com>
Subject: Re: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in	calculating byte clock
Date: Mon, 16 Feb 2015 22:44:31 +0000	[thread overview]
Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912632318070@IRSMSX108.ger.corp.intel.com> (raw)
In-Reply-To: <1423116841-19799-6-git-send-email-stephen@networkplumber.org>

Hi Stephen,

Sorry, NACK.

1. Overflow issue
As you declare cycles_per_byte as uint32_t, for a CPU frequency of 2-3 GHz, the line of code below results in overflow:
	port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) / params->rate;
Therefore, there is most likely a significant accuracy loss, which might result in more packets allowed to go out than it should.

2. Integer division has a higher cost than floating point division
My understanding is we are considering a performance improvement by replacing the double precision floating point division in:
	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
with an integer division:
	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) / port->cycles_per_byte;
I don't think this is going to have the claimed benefit, as acording to "Intel 64 and IA-32 Architectures Optimization  Reference Manual" (Appendix C), the latency of the integer division instruction is significantly bigger than the latency of integer division:
	Instruction FDIV double precision: latency = 38-40 cycles
	Instruction IDIV: latency = 56 - 80 cycles

3. Alternative
I hear though your suggestion about replacing the floating point division with a more performant construction. One suggestion would be to replace it with an integer multiplication followed by a shift right, probably by using a uint64_t bytes_per_cycle_scaled_up (the inverse of cycles_per_bytes). I need to prototype this code myself. Would you be OK to look into providing an alternative implementation?

Thanks,
Cristian


-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Thursday, February 5, 2015 6:14 AM
To: dev@dpdk.org
Cc: Stephen Hemminger
Subject: [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock

From: Stephen Hemminger <shemming@brocade.com>

The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled math instead.
This improved performance from 5Gbit/sec to 10 Gbit/sec

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_sched/rte_sched.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 55fbc14..3023457 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -102,6 +102,9 @@
 
 #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
 
+/* For cycles_per_byte calculation */
+#define RTE_SCHED_TIME_SHIFT		      20
+
 struct rte_sched_subport {
 	/* Token bucket (TB) */
 	uint64_t tb_time; /* time of last update */
@@ -239,7 +242,7 @@ struct rte_sched_port {
 	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU cyles */
 	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes */
 	uint64_t time;                /* Current NIC TX time measured in bytes */
-	double cycles_per_byte;       /* CPU cycles per byte */
+	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
 
 	/* Scheduling loop detection */
 	uint32_t pipe_loop;
@@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params)
 	port->time_cpu_cycles = rte_get_tsc_cycles();
 	port->time_cpu_bytes = 0;
 	port->time = 0;
-	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) params->rate);
+
+	port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT)
+		/ params->rate;
 
 	/* Scheduling loop detection */
 	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
@@ -2156,11 +2161,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port)
 {
 	uint64_t cycles = rte_get_tsc_cycles();
 	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
-	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
+	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
+		/ port->cycles_per_byte;
 
 	/* Advance port time */
 	port->time_cpu_cycles = cycles;
-	port->time_cpu_bytes += (uint64_t) bytes_diff;
+	port->time_cpu_bytes += bytes_diff;
 	if (port->time < port->time_cpu_bytes) {
 		port->time = port->time_cpu_bytes;
 	}
-- 
2.1.4

--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

  reply	other threads:[~2015-02-16 22:44 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05  6:13 [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read Stephen Hemminger
2015-02-05  6:13 ` [dpdk-dev] [PATCH v2 5/7] rte_sched: don't put tabs in log messages Stephen Hemminger
2015-02-20 18:33   ` Dumitrescu, Cristian
2015-02-05  6:14 ` [dpdk-dev] [PATCH v2 6/7] rte_sched: eliminate floating point in calculating byte clock Stephen Hemminger
2015-02-16 22:44   ` Dumitrescu, Cristian [this message]
2015-02-17 16:05     ` Stephen Hemminger
2015-02-05  6:14 ` [dpdk-dev] [PATCH v2 7/7] rte_sched: rearrange data structures Stephen Hemminger
2015-02-20 18:43   ` Dumitrescu, Cristian
2015-02-05 12:43 ` [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read Neil Horman
2015-02-23 23:51   ` Thomas Monjalon
     [not found]   ` <fa17ab0c3bc041b88e18d3d76a255f13@HQ1WP-EXMB11.corp.brocade.com>
2015-02-24 19:18     ` Stephen Hemminger
2015-02-24 20:06       ` Thomas Monjalon
2015-02-25 17:29         ` Dumitrescu, Cristian
2015-03-10 13:55         ` Thomas Monjalon
2015-02-09 22:48 ` Dumitrescu, Cristian
2015-02-09 22:55   ` Stephen Hemminger
2015-02-20 18:32     ` Dumitrescu, Cristian
2015-02-20 19:52       ` Stephen Hemminger
2015-02-20 20:23         ` Dumitrescu, Cristian
2015-02-20 21:01           ` Thomas Monjalon
2015-02-20 21:28             ` Dumitrescu, Cristian
2015-02-21  1:53               ` Stephen Hemminger
2015-02-23 12:06                 ` Dumitrescu, Cristian
2015-02-09 23:46   ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EB4FA525960D640B5BDFFD6A3D8912632318070@IRSMSX108.ger.corp.intel.com \
    --to=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=shemming@brocade.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).