From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stephen@networkplumber.org>
Received: from mail-la0-f47.google.com (mail-la0-f47.google.com
 [209.85.215.47]) by dpdk.org (Postfix) with ESMTP id 6A5D23208
 for <dev@dpdk.org>; Sun,  1 Feb 2015 11:04:14 +0100 (CET)
Received: by mail-la0-f47.google.com with SMTP id hz20so32726960lab.6
 for <dev@dpdk.org>; Sun, 01 Feb 2015 02:04:14 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=iFIrImkxJq7G1PYW+yMIsbWmxquBv2YYAvdCOkI/o0M=;
 b=YQ5o7My8ObxDf2oHEHH+hSL01OO9wHlKuOmaABGoiPhOw4VEl8Hy6fC0/pQPlWMmdV
 4n+KjTBGujf9X2NDw6Sdl8FML8mDqROZV72IjnMEm2b7oEQBcx+BMOuGO0Xs9IuMLVP2
 I6m+K61t91cXfTl1UKK8KANNWPRjMIkA0i+WZ2d/gGW5Hqssb+/TA4WxrWWA0PzwvkSp
 cdqd5PKzSv3pUquWKDLAIFctlwngPlAHCDA6qNlyXCHcOibkkS0HJGNCi8wBL2RZQQb4
 wBYYgnJnRU7vo0BMAz1o57jM98rlOBlI8zg7pWTJtwBaNsADtDNmM6ZWGNxTr6hJ+p/j
 NhEg==
X-Gm-Message-State: ALoCoQm741MBJSON/CRaoOQCX+dUcfzz2CSPjUwIZBOZ0plZYTcAflwxH8VURCukrXNJKkgi1r7O
X-Received: by 10.152.26.98 with SMTP id k2mr14499555lag.53.1422785054173;
 Sun, 01 Feb 2015 02:04:14 -0800 (PST)
Received: from uryu.fosdem.net. ([2001:67c:1810:f0ff:c685:8ff:feca:841f])
 by mx.google.com with ESMTPSA id c4sm1608100lbp.32.2015.02.01.02.04.12
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Sun, 01 Feb 2015 02:04:13 -0800 (PST)
From: Stephen Hemminger <stephen@networkplumber.org>
To: dev@dpdk.org
Date: Sun,  1 Feb 2015 10:03:50 +0000
Message-Id: <1422785031-11494-6-git-send-email-stephen@networkplumber.org>
X-Mailer: git-send-email 2.1.4
In-Reply-To: <1422785031-11494-1-git-send-email-stephen@networkplumber.org>
References: <1422785031-11494-1-git-send-email-stephen@networkplumber.org>
Cc: Stephen Hemminger <shemming@brocade.com>
Subject: [dpdk-dev] [PATCH 6/7] rte_sched: eliminate floating point in
	calculating byte clock
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Feb 2015 10:04:14 -0000

From: Stephen Hemminger <shemming@brocade.com>

The old code was doing a floating point divide for each rte_dequeue()
which is very expensive. Change to using fixed point scaled math instead.
This improved performance from 5Gbit/sec to 10 Gbit/sec

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_sched/rte_sched.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 55fbc14..3023457 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -102,6 +102,9 @@
 
 #define RTE_SCHED_BMP_POS_INVALID             UINT32_MAX
 
+/* For cycles_per_byte calculation */
+#define RTE_SCHED_TIME_SHIFT		      20
+
 struct rte_sched_subport {
 	/* Token bucket (TB) */
 	uint64_t tb_time; /* time of last update */
@@ -239,7 +242,7 @@ struct rte_sched_port {
 	uint64_t time_cpu_cycles;     /* Current CPU time measured in CPU cyles */
 	uint64_t time_cpu_bytes;      /* Current CPU time measured in bytes */
 	uint64_t time;                /* Current NIC TX time measured in bytes */
-	double cycles_per_byte;       /* CPU cycles per byte */
+	uint32_t cycles_per_byte;       /* CPU cycles per byte (scaled) */
 
 	/* Scheduling loop detection */
 	uint32_t pipe_loop;
@@ -657,7 +660,9 @@ rte_sched_port_config(struct rte_sched_port_params *params)
 	port->time_cpu_cycles = rte_get_tsc_cycles();
 	port->time_cpu_bytes = 0;
 	port->time = 0;
-	port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) params->rate);
+
+	port->cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT)
+		/ params->rate;
 
 	/* Scheduling loop detection */
 	port->pipe_loop = RTE_SCHED_PIPE_INVALID;
@@ -2156,11 +2161,12 @@ rte_sched_port_time_resync(struct rte_sched_port *port)
 {
 	uint64_t cycles = rte_get_tsc_cycles();
 	uint64_t cycles_diff = cycles - port->time_cpu_cycles;
-	double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte;
+	uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT)
+		/ port->cycles_per_byte;
 
 	/* Advance port time */
 	port->time_cpu_cycles = cycles;
-	port->time_cpu_bytes += (uint64_t) bytes_diff;
+	port->time_cpu_bytes += bytes_diff;
 	if (port->time < port->time_cpu_bytes) {
 		port->time = port->time_cpu_bytes;
 	}
-- 
2.1.4