From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 128167CE8 for ; Mon, 4 Sep 2017 16:34:53 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Sep 2017 07:34:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,475,1498546800"; d="scan'208";a="1010802015" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by orsmga003.jf.intel.com with ESMTP; 04 Sep 2017 07:34:51 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.28]) by IRSMSX106.ger.corp.intel.com ([169.254.8.36]) with mapi id 14.03.0319.002; Mon, 4 Sep 2017 15:34:50 +0100 From: "Burakov, Anatoly" To: Pavan Nikhilesh , "dev@dpdk.org" CC: "Dumitrescu, Cristian" , "stephen@networkplumber.org" Thread-Topic: [dpdk-dev] [PATCH v3 2/3] eal: add u64 bit variant for reciprocal Thread-Index: AQHTJLFuiVEqYYffW0GXwG1CXDY7iKKky6hw Date: Mon, 4 Sep 2017 14:34:49 +0000 Message-ID: References: <1504442189-4384-1-git-send-email-pbhagavatula@caviumnetworks.com> <1504442189-4384-2-git-send-email-pbhagavatula@caviumnetworks.com> In-Reply-To: <1504442189-4384-2-git-send-email-pbhagavatula@caviumnetworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_PUBLIC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOWY1OTc4NTctNzg5My00NTk5LThmNWYtNDUyNjMwYTU1YTU0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX1BVQkxJQyJ9XX1dfSwiU3ViamVjdExhYmVscyI6W10sIlRNQ1ZlcnNpb24iOiIxNi41LjkuMyIsIlRydXN0ZWRMYWJlbEhhc2giOiJPV0VBaVFmN0Q4U1VuYW1zUlVrWlk2cWJFeWJ1RENtclNWXC9FNys3djlzZz0ifQ== dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 2/3] eal: add u64 bit variant for reciprocal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Sep 2017 14:34:54 -0000 > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pavan Nikhilesh > Sent: Sunday, September 3, 2017 1:36 PM > To: dev@dpdk.org > Cc: Dumitrescu, Cristian ; > stephen@networkplumber.org; Pavan Nikhilesh > > Subject: [dpdk-dev] [PATCH v3 2/3] eal: add u64 bit variant for reciproca= l >=20 > Currently, rte_reciprocal only supports unsigned 32bit divisors. This com= mit > adds support for unsigned 64bit divisors. >=20 > Rename unsigned 32bit specific functions appropriately and update > librte_sched accordingly. >=20 > Signed-off-by: Pavan Nikhilesh > --- > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 3 +- > lib/librte_eal/common/include/rte_reciprocal.h | 111 > ++++++++++++++++++++-- > lib/librte_eal/common/rte_reciprocal.c | 120 > +++++++++++++++++++++--- > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 3 +- > lib/librte_sched/Makefile | 4 +- > lib/librte_sched/rte_sched.c | 9 +- > 6 files changed, 222 insertions(+), 28 deletions(-) >=20 > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > index d0bda66..5fd6101 100644 > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > @@ -242,6 +242,7 @@ EXPERIMENTAL { > DPDK_17.11 { > global: >=20 > - rte_reciprocal_value; > + rte_reciprocal_value_u32; > + rte_reciprocal_value_u64; >=20 > } DPDK_17.08; > diff --git a/lib/librte_eal/common/include/rte_reciprocal.h > b/lib/librte_eal/common/include/rte_reciprocal.h > index b6d752f..801d1c8 100644 > --- a/lib/librte_eal/common/include/rte_reciprocal.h > +++ b/lib/librte_eal/common/include/rte_reciprocal.h > @@ -22,22 +22,117 @@ > #ifndef _RTE_RECIPROCAL_H_ > #define _RTE_RECIPROCAL_H_ >=20 > -#include > +#include >=20 > -struct rte_reciprocal { > +/** > + * Unsigned 32-bit divisor structure. > + */ > +struct rte_reciprocal_u32 { > uint32_t m; > uint8_t sh1, sh2; > -}; > +} __rte_cache_aligned; > + > +/** > + * Unsigned 64-bit divisor structure. > + */ > +struct rte_reciprocal_u64 { > + uint64_t m; > + uint8_t sh1; > +} __rte_cache_aligned; >=20 > +/** > + * Divide given unsigned 32-bit integer with pre calculated divisor. > + * > + * @param a > + * The 32-bit dividend. > + * @param R > + * The pointer to pre calculated divisor reciprocal structure. > + * > + * @return > + * The result of the division > + */ > static inline uint32_t > -rte_reciprocal_divide(uint32_t a, struct rte_reciprocal R) > +rte_reciprocal_divide_u32(uint32_t a, struct rte_reciprocal_u32 *R) { > + uint32_t t =3D (((uint64_t)a * R->m) >> 32); > + > + return (t + ((a - t) >> R->sh1)) >> R->sh2; } > + > +static inline uint64_t > +mullhi_u64(uint64_t x, uint64_t y) > +{ > +#ifdef __SIZEOF_INT128__ > + __uint128_t xl =3D x; > + __uint128_t rl =3D xl * y; > + > + return (rl >> 64); > +#else > + uint64_t u0, u1, v0, v1, k, t; > + uint64_t w1, w2; > + uint64_t whi; > + > + u1 =3D x >> 32; u0 =3D x & 0xFFFFFFFF; > + v1 =3D y >> 32; v0 =3D y & 0xFFFFFFFF; > + > + t =3D u0*v0; > + k =3D t >> 32; > + > + t =3D u1*v0 + k; > + w1 =3D t & 0xFFFFFFFF; > + w2 =3D t >> 32; > + > + t =3D u0*v1 + w1; > + k =3D t >> 32; > + > + whi =3D u1*v1 + w2 + k; > + > + return whi; > +#endif > +} > + > +/** > + * Divide given unsigned 64-bit integer with pre calculated divisor. > + * > + * @param a > + * The 64-bit dividend. > + * @param R > + * The pointer to pre calculated divisor reciprocal structure. > + * > + * @return > + * The result of the division > + */ > +static inline uint64_t > +rte_reciprocal_divide_u64(uint64_t a, struct rte_reciprocal_u64 *R) > { > - uint32_t t =3D (uint32_t)(((uint64_t)a * R.m) >> 32); > + uint64_t q =3D mullhi_u64(R->m, a); > + uint64_t t =3D ((a - q) >> 1) + q; >=20 > - return (t + ((a - t) >> R.sh1)) >> R.sh2; > + return t >> R->sh1; > } >=20 > -struct rte_reciprocal > -rte_reciprocal_value(uint32_t d); > +/** > + * Generate pre calculated divisor structure. > + * > + * @param d > + * The unsigned 32-bit divisor. > + * > + * @return > + * Divisor structure. > + */ > +struct rte_reciprocal_u32 > +rte_reciprocal_value_u32(uint32_t d); > + > +/** > + * Generate pre calculated divisor structure. > + * > + * @param d > + * The unsigned 64-bit divisor. > + * > + * @return > + * Divisor structure. > + */ > +struct rte_reciprocal_u64 > +rte_reciprocal_value_u64(uint64_t d); >=20 > #endif /* _RTE_RECIPROCAL_H_ */ > diff --git a/lib/librte_eal/common/rte_reciprocal.c > b/lib/librte_eal/common/rte_reciprocal.c > index 7ab99b4..5d7e367 100644 > --- a/lib/librte_eal/common/rte_reciprocal.c > +++ b/lib/librte_eal/common/rte_reciprocal.c > @@ -31,18 +31,13 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > */ >=20 > -#include > -#include > - > -#include > - > -#include "rte_reciprocal.h" > +#include >=20 > /* find largest set bit. > * portable and slow but does not matter for this usage. > */ > static inline int > -fls(uint32_t x) > +fls_u32(uint32_t x) > { > int b; >=20 > @@ -54,21 +49,120 @@ fls(uint32_t x) > return 0; > } >=20 > -struct rte_reciprocal > -rte_reciprocal_value(uint32_t d) > +struct rte_reciprocal_u32 > +rte_reciprocal_value_u32(uint32_t d) > { > - struct rte_reciprocal R; > + struct rte_reciprocal_u32 R; > uint64_t m; > int l; >=20 > - l =3D fls(d - 1); > + l =3D fls_u32(d - 1); > m =3D ((1ULL << 32) * ((1ULL << l) - d)); > m /=3D d; >=20 > ++m; > R.m =3D m; > - R.sh1 =3D RTE_MIN(l, 1); > - R.sh2 =3D RTE_MAX(l - 1, 0); > + R.sh1 =3D l > 1 ? 1 : l; > + R.sh2 =3D (l - 1 > 0) ? l - 1 : 0; Is there a specific reason for changing from RTE_MIN to what you have? R.sh= 2 seems identical to the previous version (so reason for change is unclear)= , and R.sh1 changes behavior (R.sh1 can now potentially be zero, even if in= practice it won't) without any explanation given. Thanks, Anatoly > + > + return R; > +} > + > +/* Code taken from Hacker's Delight: > + * http://www.hackersdelight.org/HDcode/divlu.c. > + * License permits inclusion here per: > + * http://www.hackersdelight.org/permissions.htm > + */ > +static inline uint64_t > +divide_128_div_64_to_64(uint64_t u1, uint64_t u0, uint64_t v, uint64_t > +*r) { > + const uint64_t b =3D (1ULL << 32); /* Number base (16 bits). */ > + uint64_t un1, un0, /* Norm. dividend LSD's. */ > + vn1, vn0, /* Norm. divisor digits. */ > + q1, q0, /* Quotient digits. */ > + un64, un21, un10, /* Dividend digit pairs. */ > + rhat; /* A remainder. */ > + int s; /* Shift amount for norm. */ > + > + /* If overflow, set rem. to an impossible value. */ > + if (u1 >=3D v) { > + if (r !=3D NULL) > + *r =3D (uint64_t) -1; > + return (uint64_t) -1; > + } > + > + /* Count leading zeros. */ > + s =3D __builtin_clzll(v); > + if (s > 0) { > + v =3D v << s; > + un64 =3D (u1 << s) | ((u0 >> (64 - s)) & (-s >> 31)); > + un10 =3D u0 << s; > + } else { > + > + un64 =3D u1 | u0; > + un10 =3D u0; > + } > + > + vn1 =3D v >> 32; > + vn0 =3D v & 0xFFFFFFFF; > + > + un1 =3D un10 >> 32; > + un0 =3D un10 & 0xFFFFFFFF; > + > + q1 =3D un64/vn1; > + rhat =3D un64 - q1*vn1; > +again1: > + if (q1 >=3D b || q1*vn0 > b*rhat + un1) { > + q1 =3D q1 - 1; > + rhat =3D rhat + vn1; > + if (rhat < b) > + goto again1; > + } > + > + un21 =3D un64*b + un1 - q1*v; > + > + q0 =3D un21/vn1; > + rhat =3D un21 - q0*vn1; > +again2: > + if (q0 >=3D b || q0*vn0 > b*rhat + un0) { > + q0 =3D q0 - 1; > + rhat =3D rhat + vn1; > + if (rhat < b) > + goto again2; > + } > + > + if (r !=3D NULL) > + *r =3D (un21*b + un0 - q0*v) >> s; > + return q1*b + q0; > +} > + > +struct rte_reciprocal_u64 > +rte_reciprocal_value_u64(uint64_t d) > +{ > + struct rte_reciprocal_u64 R; > + > + const uint32_t fld =3D 63 - __builtin_clzll(d); > + > + if ((d & (d - 1)) =3D=3D 0) { > + R.m =3D 0; > + R.sh1 =3D (fld - 1) | 0x40; > + } else { > + uint64_t rem; > + uint64_t multiplier; > + uint8_t more; > + > + multiplier =3D divide_128_div_64_to_64(1ULL << fld, 0, d, > &rem); > + multiplier +=3D multiplier; > + > + const uint64_t twice_rem =3D rem + rem; > + if (twice_rem >=3D d || twice_rem < rem) > + multiplier +=3D 1; > + more =3D fld; > + R.m =3D 1 + multiplier; > + R.sh1 =3D more | 0x40; > + } > + > + R.sh1 &=3D 0x3F; >=20 > return R; > } > diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > index 65117cb..63ff2b8 100644 > --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map > @@ -247,6 +247,7 @@ EXPERIMENTAL { > DPDK_17.11 { > global: >=20 > - rte_reciprocal_value; > + rte_reciprocal_value_u32; > + rte_reciprocal_value_u64; >=20 > } DPDK_17.08; > diff --git a/lib/librte_sched/Makefile b/lib/librte_sched/Makefile index > 569656b..a2fd6f3 100644 > --- a/lib/librte_sched/Makefile > +++ b/lib/librte_sched/Makefile > @@ -54,6 +54,8 @@ LIBABIVER :=3D 1 > SRCS-$(CONFIG_RTE_LIBRTE_SCHED) +=3D rte_sched.c rte_red.c rte_approx.c >=20 > # install includes > -SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include :=3D rte_sched.h > rte_bitmap.h rte_sched_common.h rte_red.h rte_approx.h > +SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include :=3D rte_sched.h > rte_bitmap.h > +SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include +=3D rte_sched_common.h > +rte_red.h SYMLINK-$(CONFIG_RTE_LIBRTE_SCHED)-include +=3D > rte_approx.h >=20 > include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c = index > 3b8ccaa..7bb6d51 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -228,7 +228,7 @@ struct rte_sched_port { > uint64_t time_cpu_cycles; /* Current CPU time measured in CPU > cyles */ > uint64_t time_cpu_bytes; /* Current CPU time measured in bytes > */ > uint64_t time; /* Current NIC TX time measured in bytes = */ > - struct rte_reciprocal inv_cycles_per_byte; /* CPU cycles per byte */ > + struct rte_reciprocal_u32 inv_cycles_per_byte; /* CPU cycles per > byte > +*/ >=20 > /* Scheduling loop detection */ > uint32_t pipe_loop; > @@ -677,7 +677,7 @@ rte_sched_port_config(struct > rte_sched_port_params *params) >=20 > cycles_per_byte =3D (rte_get_tsc_hz() << RTE_SCHED_TIME_SHIFT) > / params->rate; > - port->inv_cycles_per_byte =3D rte_reciprocal_value(cycles_per_byte); > + port->inv_cycles_per_byte =3D > rte_reciprocal_value_u32(cycles_per_byte); >=20 > /* Scheduling loop detection */ > port->pipe_loop =3D RTE_SCHED_PIPE_INVALID; @@ -2147,8 +2147,9 > @@ rte_sched_port_time_resync(struct rte_sched_port *port) > uint64_t bytes_diff; >=20 > /* Compute elapsed time in bytes */ > - bytes_diff =3D rte_reciprocal_divide(cycles_diff << > RTE_SCHED_TIME_SHIFT, > - port->inv_cycles_per_byte); > + bytes_diff =3D rte_reciprocal_divide_u32( > + cycles_diff << RTE_SCHED_TIME_SHIFT, > + &port->inv_cycles_per_byte); >=20 > /* Advance port time */ > port->time_cpu_cycles =3D cycles; > -- > 2.7.4