From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id E21AE594E for ; Thu, 12 Mar 2015 20:06:43 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP; 12 Mar 2015 12:06:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,390,1422950400"; d="scan'208";a="466467486" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by FMSMGA003.fm.intel.com with ESMTP; 12 Mar 2015 11:59:48 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.218]) by IRSMSX103.ger.corp.intel.com ([169.254.3.247]) with mapi id 14.03.0195.001; Thu, 12 Mar 2015 19:06:40 +0000 From: "Dumitrescu, Cristian" To: Stephen Hemminger Thread-Topic: [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock Thread-Index: AQHQW01Fg0I84C9Ty0W6T+MemPGHSJ0ZMkyg Date: Thu, 12 Mar 2015 19:06:39 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D89126323225C5@IRSMSX108.ger.corp.intel.com> References: <1426004018-25948-1-git-send-email-stephen@networkplumber.org> <1426004018-25948-7-git-send-email-stephen@networkplumber.org> In-Reply-To: <1426004018-25948-7-git-send-email-stephen@networkplumber.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" , Stephen Hemminger Subject: Re: [dpdk-dev] [PATCH v2 6/6] rte_sched: eliminate floating point in calculating byte clock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 19:06:44 -0000 > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Tuesday, March 10, 2015 4:14 PM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Stephen Hemminger; Stephen Hemminger > Subject: [PATCH v2 6/6] rte_sched: eliminate floating point in calculatin= g byte > clock >=20 > From: Stephen Hemminger >=20 > The old code was doing a floating point divide for each rte_dequeue() > which is very expensive. Change to using fixed point scaled math instead. > This improved performance from 5Gbit/sec to 10 Gbit/sec >=20 > Signed-off-by: Stephen Hemminger > --- > v2 -- no changes > despite objections, the performance observation is real > on Intel(R) Core(TM) i7-3770 CPU >=20 > lib/librte_sched/rte_sched.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) >=20 > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 74d0e0a..522a647 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -102,6 +102,9 @@ >=20 > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX >=20 > +/* For cycles_per_byte calculation */ > +#define RTE_SCHED_TIME_SHIFT 20 > + > struct rte_sched_subport { > /* Token bucket (TB) */ > uint64_t tb_time; /* time of last update */ > @@ -239,7 +242,7 @@ struct rte_sched_port { > uint64_t time_cpu_cycles; /* Current CPU time measured in CPU > cyles */ > uint64_t time_cpu_bytes; /* Current CPU time measured in bytes > */ > uint64_t time; /* Current NIC TX time measured in bytes = */ > - double cycles_per_byte; /* CPU cycles per byte */ > + uint32_t cycles_per_byte; /* CPU cycles per byte (scaled) */ >=20 > /* Scheduling loop detection */ > uint32_t pipe_loop; > @@ -657,7 +660,9 @@ rte_sched_port_config(struct > rte_sched_port_params *params) > port->time_cpu_cycles =3D rte_get_tsc_cycles(); > port->time_cpu_bytes =3D 0; > port->time =3D 0; > - port->cycles_per_byte =3D ((double) rte_get_tsc_hz()) / ((double) > params->rate); > + > + port->cycles_per_byte =3D (rte_get_tsc_hz() << > RTE_SCHED_TIME_SHIFT) > + / params->rate; >=20 > /* Scheduling loop detection */ > port->pipe_loop =3D RTE_SCHED_PIPE_INVALID; > @@ -2126,11 +2131,12 @@ rte_sched_port_time_resync(struct > rte_sched_port *port) > { > uint64_t cycles =3D rte_get_tsc_cycles(); > uint64_t cycles_diff =3D cycles - port->time_cpu_cycles; > - double bytes_diff =3D ((double) cycles_diff) / port->cycles_per_byte; > + uint64_t bytes_diff =3D (cycles_diff << RTE_SCHED_TIME_SHIFT) > + / port->cycles_per_byte; >=20 > /* Advance port time */ > port->time_cpu_cycles =3D cycles; > - port->time_cpu_bytes +=3D (uint64_t) bytes_diff; > + port->time_cpu_bytes +=3D bytes_diff; > if (port->time < port->time_cpu_bytes) { > port->time =3D port->time_cpu_bytes; > } > -- > 2.1.4 Stephen, We agreed during the previous round to look at 64-bit multiplication option= , but looks like this patch is identical to the previous one. Did you meet = any issues in implementing this approach? As stated before, I do not think = this is the best solution for the reasons listed previously, and this part = of the code is too sensitive to take the risk. Since Thomas indicated these patches will be considered for 2.1 rather than= 2.0 release, it looks like we have some time to refine this work. I would = reiterate the same proposal that Thomas made: re-submit the patches where w= e have consensus, and keep this one out for the moment; you and me can sync= up offline and come back with an implementation proposal that would hopefu= lly address the previous concerns for 2.1 release. Would this work for you? Thank you for your work and for your understanding! Regards, Cristian