From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f175.google.com (mail-pf0-f175.google.com [209.85.192.175]) by dpdk.org (Postfix) with ESMTP id F11D85683 for ; Wed, 2 Dec 2015 23:08:09 +0100 (CET) Received: by pfu207 with SMTP id 207so2047124pfu.2 for ; Wed, 02 Dec 2015 14:08:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=z3x6DMmxruzF7u3rVoWYF1GW/oSJ5zjIQeKQHlINjvA=; b=FTrs/lt+DSQflYqq/jnQe0oRBIViPI8lr1QoUYMa5Znr+Z4PYIJ8BWPB7Beca+U+I6 hggYo7MeJmSrPM7aZqIprOyAE0YRMPtcF4aBrhNWHd+n2xl6U08Y30RyAskU3Ts8FT/m 1Gxu0pU1IzZ4jgVGvpvLEkW3kBIgzpaJMuKFQIutXbKdewo4uxKd+nOvJvtugYwB4L1v 4NK5RuJc8UqKyWIYc/EiHAr02IQ4WdZX9DUx8pnEsYyTXXkNUdBn3/jc/8dLp+++1cEc gpBbnTEtQgJslnkuYxQITO9lwiyrmbqWuLg3rYY8YTOdtjUYjDkmJ2IJ1b2icTXpMrKo Retg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=z3x6DMmxruzF7u3rVoWYF1GW/oSJ5zjIQeKQHlINjvA=; b=OeYvc1WwTNj6ZQk6A9+1zbPnk2aGJ08U0IGukjdFSfTwGTu9V7xJbmj0Zv/3tt1vhq hEp2X4l1xX3+PZP7LCyvByilNi0ch/cfDY8FOC9BCXOkMbAecdED3rH5Dlw6e7+EZmds wXRDsGfQ4D/x6x5pY/qVmJTycYfpSkOiP/z9m4unVQn8djQN+q8r9b6DQX3wbsszqIcg LyC57gKEUrBimlm9DI09rkpWi6HvCKsdgBTdZC68Aw1wi6ZNhKlPcvI322guJwQiaIhQ wN7Yj6nCpAchTrpvG1Jkerna8icmJ6HId6habVjhOL+snqQL66qhQVlrod+iFZOb//k1 sAAA== X-Gm-Message-State: ALoCoQmsavEywfKY9qgeeQLvsobU0DvIyfM7+i3adWwa01nQhxlE9H+9FwIfG5cxAZ4D0ZBEqERB X-Received: by 10.98.0.195 with SMTP id 186mr8201066pfa.130.1449094089402; Wed, 02 Dec 2015 14:08:09 -0800 (PST) Received: from xeon-e3 (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by smtp.gmail.com with ESMTPSA id 62sm6268892pfl.90.2015.12.02.14.08.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Dec 2015 14:08:09 -0800 (PST) Date: Wed, 2 Dec 2015 14:08:19 -0800 From: Stephen Hemminger To: "Dumitrescu, Cristian" Message-ID: <20151202140819.5d268f62@xeon-e3> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D8912647925BD2@IRSMSX108.ger.corp.intel.com> References: <1448822809-8350-1-git-send-email-stephen@networkplumber.org> <1448822809-8350-4-git-send-email-stephen@networkplumber.org> <3EB4FA525960D640B5BDFFD6A3D8912647925BD2@IRSMSX108.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 3/3] rte_sched: eliminate floating point in calculating byte clock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 22:08:10 -0000 On Wed, 2 Dec 2015 16:48:17 +0000 "Dumitrescu, Cristian" wrote: > > > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Sunday, November 29, 2015 8:47 PM > > To: Dumitrescu, Cristian > > Cc: dev@dpdk.org; Stephen Hemminger > > Subject: [PATCH 3/3] rte_sched: eliminate floating point in calculating byte > > clock > > > > The old code was doing a floating point divide for each rte_dequeue() > > which is very expensive. Change to using fixed point scaled inverse > > multiply. To maintain equivalent precision, scaled math is used. > > The application ABI is the same. > > > > This improved performance from 5Gbit/sec to 10 Gbit/sec when configured > > for 10 Gbit/sec rate. > > > > There was some feedback from Cristian that he wanted a better > > solution and was going to give one, but none was provided. > > For 2.2 this is a better solution than existing code, if someone > > has a better version I would love to see it. > > > > Signed-off-by: Stephen Hemminger > > --- > > lib/librte_sched/rte_sched.c | 23 ++++++++++++++++++----- > > 1 file changed, 18 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > > index 16acd6b..cfae136 100644 > > --- a/lib/librte_sched/rte_sched.c > > +++ b/lib/librte_sched/rte_sched.c > > @@ -47,6 +47,7 @@ > > #include "rte_bitmap.h" > > #include "rte_sched_common.h" > > #include "rte_approx.h" > > +#include "rte_reciprocal.h" > > > > #ifdef __INTEL_COMPILER > > #pragma warning(disable:2259) /* conversion may lose significant bits */ > > @@ -62,6 +63,11 @@ > > #define RTE_SCHED_PIPE_INVALID UINT32_MAX > > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX > > > > +/* Scaling for cycles_per_byte calculation > > + * Chosen so that minimum rate is 480 bit/sec > > + */ > > +#define RTE_SCHED_TIME_SHIFT 8 > > Stephen, can you please elaborate why we need to shift the dividend at all and why the shift value was picked as 8? Is 8 a hard constraint? How does this affect the scheduling precision/accuracy? The shift value is a tradeoff for scaled math. The bigger the shift the finer the resolution, but at the risk of overflow in the cycles_per_byte. The value was chosen as a tradeoff based on current CPU clock rate (TSC) and minimum rate. > > + > > struct rte_sched_subport { > > /* Token bucket (TB) */ > > uint64_t tb_time; /* time of last update */ > > @@ -215,7 +221,7 @@ struct rte_sched_port { > > uint64_t time_cpu_cycles; /* Current CPU time measured in CPU > > cyles */ > > uint64_t time_cpu_bytes; /* Current CPU time measured in bytes > > */ > > uint64_t time; /* Current NIC TX time measured in bytes */ > > - double cycles_per_byte; /* CPU cycles per byte */ > > + struct rte_reciprocal inv_cycles_per_byte; /* CPU cycles per byte */ > > > > /* Scheduling loop detection */ > > uint32_t pipe_loop; > > @@ -610,7 +616,7 @@ struct rte_sched_port * > > rte_sched_port_config(struct rte_sched_port_params *params) > > { > > struct rte_sched_port *port = NULL; > > - uint32_t mem_size, bmp_mem_size, n_queues_per_port, i; > > + uint32_t mem_size, bmp_mem_size, n_queues_per_port, i, > > cycles_per_byte; > > > > /* Check user parameters. Determine the amount of memory to > > allocate */ > > mem_size = rte_sched_port_get_memory_footprint(params); > > @@ -661,7 +667,10 @@ rte_sched_port_config(struct > > rte_sched_port_params *params) > > port->time_cpu_cycles = rte_get_tsc_cycles(); > > port->time_cpu_bytes = 0; > > port->time = 0; > > - port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) > > params->rate); > > + > > + cycles_per_byte = (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) > > + / params->rate; > > + port->inv_cycles_per_byte = rte_reciprocal_value(cycles_per_byte); > > > > /* Scheduling loop detection */ > > port->pipe_loop = RTE_SCHED_PIPE_INVALID; > > @@ -2088,11 +2097,15 @@ rte_sched_port_time_resync(struct > > rte_sched_port *port) > > { > > uint64_t cycles = rte_get_tsc_cycles(); > > uint64_t cycles_diff = cycles - port->time_cpu_cycles; > > - double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; > > + uint64_t bytes_diff; > > + > > + /* Compute elapsed time in bytes */ > > + bytes_diff = rte_reciprocal_divide(cycles_diff << > > RTE_SCHED_TIME_SHIFT, > > + port->inv_cycles_per_byte); > > > > /* Advance port time */ > > port->time_cpu_cycles = cycles; > > - port->time_cpu_bytes += (uint64_t) bytes_diff; > > + port->time_cpu_bytes += bytes_diff; > > if (port->time < port->time_cpu_bytes) > > port->time = port->time_cpu_bytes; > > > > -- > > 2.1.4 > > Can you provide some insight into how you tested this code and the test vectors you used? We tested with 10 gbit link and range of rates from 10k bit up to 10 gbit.