From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 81BF15913 for ; Wed, 2 Dec 2015 17:48:21 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP; 02 Dec 2015 08:48:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,373,1444719600"; d="scan'208";a="698867302" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by orsmga003.jf.intel.com with ESMTP; 02 Dec 2015 08:48:18 -0800 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.23]) by IRSMSX151.ger.corp.intel.com ([169.254.4.95]) with mapi id 14.03.0248.002; Wed, 2 Dec 2015 16:48:17 +0000 From: "Dumitrescu, Cristian" To: Stephen Hemminger Thread-Topic: [PATCH 3/3] rte_sched: eliminate floating point in calculating byte clock Thread-Index: AQHRKtZSchh4r7nFGEq5a2WmoKzUP5636/HQ Date: Wed, 2 Dec 2015 16:48:17 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647925BD2@IRSMSX108.ger.corp.intel.com> References: <1448822809-8350-1-git-send-email-stephen@networkplumber.org> <1448822809-8350-4-git-send-email-stephen@networkplumber.org> In-Reply-To: <1448822809-8350-4-git-send-email-stephen@networkplumber.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-inteldataclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsIiwiaWQiOiI0MjYxNjgxNS0xOTBjLTQ2NzgtOTU0Yy05YjIyMzYzMmFlNDMiLCJwcm9wcyI6W3sibiI6IkludGVsRGF0YUNsYXNzaWZpY2F0aW9uIiwidmFscyI6W3sidmFsdWUiOiJDVFBfSUMifV19XX0sIlN1YmplY3RMYWJlbHMiOltdLCJUTUNWZXJzaW9uIjoiMTUuNC4xMC4xOSIsIlRydXN0ZWRMYWJlbEhhc2giOiJmQXFXYkNsMDlubERHb2FaZSszRklCSEsxRGw1b2QwWEtmcmxkTjdScWlZPSJ9 x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 3/3] rte_sched: eliminate floating point in calculating byte clock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 16:48:22 -0000 > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Sunday, November 29, 2015 8:47 PM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Stephen Hemminger > Subject: [PATCH 3/3] rte_sched: eliminate floating point in calculating b= yte > clock >=20 > The old code was doing a floating point divide for each rte_dequeue() > which is very expensive. Change to using fixed point scaled inverse > multiply. To maintain equivalent precision, scaled math is used. > The application ABI is the same. >=20 > This improved performance from 5Gbit/sec to 10 Gbit/sec when configured > for 10 Gbit/sec rate. >=20 > There was some feedback from Cristian that he wanted a better > solution and was going to give one, but none was provided. > For 2.2 this is a better solution than existing code, if someone > has a better version I would love to see it. >=20 > Signed-off-by: Stephen Hemminger > --- > lib/librte_sched/rte_sched.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) >=20 > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 16acd6b..cfae136 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -47,6 +47,7 @@ > #include "rte_bitmap.h" > #include "rte_sched_common.h" > #include "rte_approx.h" > +#include "rte_reciprocal.h" >=20 > #ifdef __INTEL_COMPILER > #pragma warning(disable:2259) /* conversion may lose significant bits */ > @@ -62,6 +63,11 @@ > #define RTE_SCHED_PIPE_INVALID UINT32_MAX > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX >=20 > +/* Scaling for cycles_per_byte calculation > + * Chosen so that minimum rate is 480 bit/sec > + */ > +#define RTE_SCHED_TIME_SHIFT 8 Stephen, can you please elaborate why we need to shift the dividend at all = and why the shift value was picked as 8? Is 8 a hard constraint? How does t= his affect the scheduling precision/accuracy? > + > struct rte_sched_subport { > /* Token bucket (TB) */ > uint64_t tb_time; /* time of last update */ > @@ -215,7 +221,7 @@ struct rte_sched_port { > uint64_t time_cpu_cycles; /* Current CPU time measured in CPU > cyles */ > uint64_t time_cpu_bytes; /* Current CPU time measured in bytes > */ > uint64_t time; /* Current NIC TX time measured in bytes = */ > - double cycles_per_byte; /* CPU cycles per byte */ > + struct rte_reciprocal inv_cycles_per_byte; /* CPU cycles per byte */ >=20 > /* Scheduling loop detection */ > uint32_t pipe_loop; > @@ -610,7 +616,7 @@ struct rte_sched_port * > rte_sched_port_config(struct rte_sched_port_params *params) > { > struct rte_sched_port *port =3D NULL; > - uint32_t mem_size, bmp_mem_size, n_queues_per_port, i; > + uint32_t mem_size, bmp_mem_size, n_queues_per_port, i, > cycles_per_byte; >=20 > /* Check user parameters. Determine the amount of memory to > allocate */ > mem_size =3D rte_sched_port_get_memory_footprint(params); > @@ -661,7 +667,10 @@ rte_sched_port_config(struct > rte_sched_port_params *params) > port->time_cpu_cycles =3D rte_get_tsc_cycles(); > port->time_cpu_bytes =3D 0; > port->time =3D 0; > - port->cycles_per_byte =3D ((double) rte_get_tsc_hz()) / ((double) > params->rate); > + > + cycles_per_byte =3D (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) > + / params->rate; > + port->inv_cycles_per_byte =3D rte_reciprocal_value(cycles_per_byte); >=20 > /* Scheduling loop detection */ > port->pipe_loop =3D RTE_SCHED_PIPE_INVALID; > @@ -2088,11 +2097,15 @@ rte_sched_port_time_resync(struct > rte_sched_port *port) > { > uint64_t cycles =3D rte_get_tsc_cycles(); > uint64_t cycles_diff =3D cycles - port->time_cpu_cycles; > - double bytes_diff =3D ((double) cycles_diff) / port->cycles_per_byte; > + uint64_t bytes_diff; > + > + /* Compute elapsed time in bytes */ > + bytes_diff =3D rte_reciprocal_divide(cycles_diff << > RTE_SCHED_TIME_SHIFT, > + port->inv_cycles_per_byte); >=20 > /* Advance port time */ > port->time_cpu_cycles =3D cycles; > - port->time_cpu_bytes +=3D (uint64_t) bytes_diff; > + port->time_cpu_bytes +=3D bytes_diff; > if (port->time < port->time_cpu_bytes) > port->time =3D port->time_cpu_bytes; >=20 > -- > 2.1.4 Can you provide some insight into how you tested this code and the test vec= tors you used?