From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by dpdk.org (Postfix) with ESMTP id 791545921 for ; Fri, 13 Mar 2015 00:09:31 +0100 (CET) Received: by padfa1 with SMTP id fa1so24232152pad.9 for ; Thu, 12 Mar 2015 16:09:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=B4K7ERJQHnKyPQUiWQAOAAG9ra0Ok9n1wRWtyNrQqOI=; b=K7hKboAOLnlLVjQKZp//dSg4aryt2fY2FUsOauWlDiN3ShTaW8e905Agmw1bYwfLzU KYTIBnWD+HIc3UmRXgU2OvM1RVLX2UEt7/VFbJQNJyqSRHiMnrygXt54yecrtgKhgYhg 66hBe3UnYGBeLiYMzW03wRDFGX3DvXYa9nHvcrJ4hk/aLZN5lQUyI/B4HgxzcpLb7LaY BZRzYqaXrcqfAcA+Qw1tYg7Ni5M/UVgKIlLu6s/8kA93yjEQu+kV9VeKCUP8duPzb33u gibFgAtMGhj8GYmlsuR0y9pGaMlLYhC+CRbxFuLH16NGk3NaYzAvFooVtIZ/NAKADtYe QsAg== X-Gm-Message-State: ALoCoQmh/R4df2ZX6S/pQC013osGt/FkHqvdWkiYr/j3rkxGkTTVLqpOk7eVFdLXN1w5oEeLjY3I X-Received: by 10.70.93.36 with SMTP id cr4mr93670714pdb.168.1426201770744; Thu, 12 Mar 2015 16:09:30 -0700 (PDT) Received: from urahara (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by mx.google.com with ESMTPSA id ae7sm239319pac.19.2015.03.12.16.09.30 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Mar 2015 16:09:30 -0700 (PDT) Date: Thu, 12 Mar 2015 16:09:26 -0700 From: Stephen Hemminger To: "Dumitrescu, Cristian" Message-ID: <20150312160926.5d9af5d7@urahara> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D89126323225C5@IRSMSX108.ger.corp.intel.com> References: <1426004018-25948-1-git-send-email-stephen@networkplumber.org> <1426004018-25948-7-git-send-email-stephen@networkplumber.org> <3EB4FA525960D640B5BDFFD6A3D89126323225C5@IRSMSX108.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" , Stephen Hemminger Subject: Re: [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 23:09:32 -0000 On Thu, 12 Mar 2015 19:06:39 +0000 "Dumitrescu, Cristian" wrote: > > > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Tuesday, March 10, 2015 4:14 PM > > To: Dumitrescu, Cristian > > Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger > > Subject: [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte > > clock > > > > From: Stephen Hemminger > > > > The old code was doing a floating point divide for each rte_dequeue() > > which is very expensive. Change to using fixed point scaled math instead. > > This improved performance from 5Gbit/sec to 10 Gbit/sec > > > > Signed-off-by: Stephen Hemminger > > --- > > v2 -- no changes > > despite objections, the performance observation is real > > on Intel(R) Core(TM) i7-3770 CPU > > > > lib/librte_sched/rte_sched.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > > index 74d0e0a..522a647 100644 > > --- a/lib/librte_sched/rte_sched.c > > +++ b/lib/librte_sched/rte_sched.c > > @@ -102,6 +102,9 @@ > > > > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX > > > > +/* For cycles_per_byte calculation */ > > +#define RTE_SCHED_TIME_SHIFT 20 > > + > > struct rte_sched_subport { > > /* Token bucket (TB) */ > > uint64_t tb_time; /* time of last update */ > > @@ -239,7 +242,7 @@ struct rte_sched_port { > > uint64_t time_cpu_cycles; /* Current CPU time measured in CPU > > cyles */ > > uint64_t time_cpu_bytes; /* Current CPU time measured in bytes > > */ > > uint64_t time; /* Current NIC TX time measured in bytes */ > > - double cycles_per_byte; /* CPU cycles per byte */ > > + uint32_t cycles_per_byte; /* CPU cycles per byte (scaled) */ > > > > /* Scheduling loop detection */ > > uint32_t pipe_loop; > > @@ -657,7 +660,9 @@ rte_sched_port_config(struct > > rte_sched_port_params *params) > > port->time_cpu_cycles = rte_get_tsc_cycles(); > > port->time_cpu_bytes = 0; > > port->time = 0; > > - port->cycles_per_byte = ((double) rte_get_tsc_hz()) / ((double) > > params->rate); > > + > > + port->cycles_per_byte = (rte_get_tsc_hz() << > > RTE_SCHED_TIME_SHIFT) > > + / params->rate; > > > > /* Scheduling loop detection */ > > port->pipe_loop = RTE_SCHED_PIPE_INVALID; > > @@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct > > rte_sched_port *port) > > { > > uint64_t cycles = rte_get_tsc_cycles(); > > uint64_t cycles_diff = cycles - port->time_cpu_cycles; > > - double bytes_diff = ((double) cycles_diff) / port->cycles_per_byte; > > + uint64_t bytes_diff = (cycles_diff << RTE_SCHED_TIME_SHIFT) > > + / port->cycles_per_byte; > > > > /* Advance port time */ > > port->time_cpu_cycles = cycles; > > - port->time_cpu_bytes += (uint64_t) bytes_diff; > > + port->time_cpu_bytes += bytes_diff; > > if (port->time < port->time_cpu_bytes) { > > port->time = port->time_cpu_bytes; > > } > > -- > > 2.1.4 > > Stephen, > > We agreed during the previous round to look at 64-bit multiplication option, but looks like this patch is identical to the previous one. Did you meet any issues in implementing this approach? As stated before, I do not think this is the best solution for the reasons listed previously, and this part of the code is too sensitive to take the risk. > > Since Thomas indicated these patches will be considered for 2.1 rather than 2.0 release, it looks like we have some time to refine this work. I would reiterate the same proposal that Thomas made: re-submit the patches where we have consensus, and keep this one out for the moment; you and me can sync up offline and come back with an implementation proposal that would hopefully address the previous concerns for 2.1 release. Would this work for you? > > Thank you for your work and for your understanding! > > Regards, > Cristian > I haven't had time to do the analysis to figure out how to keep the maths in range so that 64 bit multiplicative divide would work. When I first tried it the issue was that the cycles_diff and bytes_diff would both have to scaled so that they are 32 bit values. Should be possible, might get back to it (or someone else might take up the cause).