From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9CD15A0547; Thu, 26 May 2022 22:19:57 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 49F3340DF7; Thu, 26 May 2022 22:19:57 +0200 (CEST) Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by mails.dpdk.org (Postfix) with ESMTP id 19C7040151 for ; Thu, 26 May 2022 22:19:55 +0200 (CEST) Received: by mail-pg1-f174.google.com with SMTP id r71so2266446pgr.0 for ; Thu, 26 May 2022 13:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s8YfMzi26cKkyt4yUP1SKaD3xQv9DLyuuYlw4SMDXQ0=; b=c+RTQe0Rgq3cKk1AKNiX6MGoDVLpZujQSv+eEh/GdIUJeq3o6kVM/3ghgJPonJFclI SDfbpfdHtKQPpxcYJnV/pcsgafMngls+WabqdWsG4n4FUI4VjpCymfFbT79l4ObMannE KvTiZ/Ngo9sINoMpd1ZVDhfUsfnFFvAE8inBVaX+QWy52nkHQvxV0PcnrjviWnfab6o8 zOSMBZqvyw5A1lY6hq2mIqGUNXK/t5mXKpnwHPKilc4iMQ3gZngzOqwm/5rj8/4hQyGW yyMBT6OuuR8/z2/5BrNDfcJCx2MQ0w9U/cq9bVypE/xPZiLZ2l+umZuva4l2jZNyyRGI cJuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s8YfMzi26cKkyt4yUP1SKaD3xQv9DLyuuYlw4SMDXQ0=; b=IzJjoKzZuorIq1/vUOsBo/+7Iy9+Qd+kYzC7mcOlrFVkJJbsrhcZtTH1xQWBvCYAnH TZOoKB8USc6+S93e0DyadnN8hrU7PWjvKB3KudrTIa+iRPn1ZpsWVH037OnGyLG+XLCY cv6O7jkg16RGUNIadhzMsDMUBHVmgFKe8aKtEQcwaQrcbXt4gupHB2ypOfIyTWQT89U1 NHhYsK3qaXNEdJ2U1AoeQh2n/4BOGlo4/QONKrxZdLdBh7AXdIoccyV6K3xEcucUBa5e fBQ/GjcHNEG0d/QuCfBkTD0G5cGd+9/6W0baa0Qn1mrg2zw5Wyicibb29virqDnxOZu5 M00Q== X-Gm-Message-State: AOAM531Q5jdnB53G1ur5CnHcYOJTtv2dDbXIlnzLS0CYmoXUhU6EhGr3 DpAfCmQoz8YDlPFJBPZQfHt/Vw== X-Google-Smtp-Source: ABdhPJze9w4U+Ly6xVh09iL4c2ztDcrKa7dmCxzNzSdvMs4ingsiD9rio6a4UXgwjQE1e275bGWqTQ== X-Received: by 2002:a05:6a00:26d6:b0:518:358f:af0f with SMTP id p22-20020a056a0026d600b00518358faf0fmr40297245pfw.69.1653596393005; Thu, 26 May 2022 13:19:53 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id r11-20020a17090a1bcb00b001dd01a5be02sm42731pjr.41.2022.05.26.13.19.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 May 2022 13:19:52 -0700 (PDT) Date: Thu, 26 May 2022 13:19:50 -0700 From: Stephen Hemminger To: Mattias =?UTF-8?B?UsO2bm5ibG9t?= Cc: dev@dpdk.org, Mattias =?UTF-8?B?UsO2bm5ibG9t?= , Ray Kinsella Subject: Re: [PATCH v4 1/3] random: add rte_drand() function Message-ID: <20220526131950.57128ebd@hermes.local> In-Reply-To: <84afee2e-fa4c-faf1-d046-febb0ae77c09@lysator.liu.se> References: <20220524184623.480646-1-stephen@networkplumber.org> <20220525203123.277180-1-stephen@networkplumber.org> <20220525203123.277180-2-stephen@networkplumber.org> <84afee2e-fa4c-faf1-d046-febb0ae77c09@lysator.liu.se> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, 26 May 2022 15:20:29 +0200 Mattias R=C3=B6nnblom wrote: > On 2022-05-25 22:31, Stephen Hemminger wrote: > > The PIE code and other applications can benefit from having a > > fast way to get a random floating point value. This new function > > is equivalent to drand() in the standard library. > >=20 > > Signed-off-by: Stephen Hemminger > > --- > > app/test/test_rand_perf.c | 7 +++++ > > doc/guides/rel_notes/release_22_07.rst | 5 ++++ > > lib/eal/common/rte_random.c | 41 ++++++++++++++++++++++++++ > > lib/eal/include/rte_random.h | 18 +++++++++++ > > lib/eal/meson.build | 3 ++ > > lib/eal/version.map | 1 + > > 6 files changed, 75 insertions(+) > >=20 > > diff --git a/app/test/test_rand_perf.c b/app/test/test_rand_perf.c > > index fe797ebfa1ca..26fb1d9a586e 100644 > > --- a/app/test/test_rand_perf.c > > +++ b/app/test/test_rand_perf.c > > @@ -20,6 +20,7 @@ static volatile uint64_t vsum; > > =20 > > enum rand_type { > > rand_type_64, > > + rand_type_float, > > rand_type_bounded_best_case, > > rand_type_bounded_worst_case > > }; > > @@ -30,6 +31,8 @@ rand_type_desc(enum rand_type rand_type) > > switch (rand_type) { > > case rand_type_64: > > return "Full 64-bit [rte_rand()]"; > > + case rand_type_float: > > + return "Floating point [rte_drand()]"; > > case rand_type_bounded_best_case: > > return "Bounded average best-case [rte_rand_max()]"; > > case rand_type_bounded_worst_case: > > @@ -55,6 +58,9 @@ test_rand_perf_type(enum rand_type rand_type) > > case rand_type_64: > > sum +=3D rte_rand(); > > break; > > + case rand_type_float: > > + sum +=3D 1000. * rte_drand(); =20 >=20 > Including this floating point multiplication will lead to an=20 > overestimation of rte_drand() latency. >=20 > You could refactor this function to be a macro, and pass the return type= =20 > to as a parameter to this macro. I did just that, and on both an AMD=20 > 5900X and a Cortex-A72 it didn't add more than ~5%, so I don't think=20 > it's necessary. >=20 > > + break; > > case rand_type_bounded_best_case: > > sum +=3D rte_rand_max(BEST_CASE_BOUND); > > break; > > @@ -83,6 +89,7 @@ test_rand_perf(void) > > printf("Pseudo-random number generation latencies:\n"); > > =20 > > test_rand_perf_type(rand_type_64); > > + test_rand_perf_type(rand_type_float); > > test_rand_perf_type(rand_type_bounded_best_case); > > test_rand_perf_type(rand_type_bounded_worst_case); > > =20 > > diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_no= tes/release_22_07.rst > > index e49cacecefd4..b131ea577226 100644 > > --- a/doc/guides/rel_notes/release_22_07.rst > > +++ b/doc/guides/rel_notes/release_22_07.rst > > @@ -104,6 +104,11 @@ New Features > > * ``RTE_EVENT_QUEUE_ATTR_WEIGHT`` > > * ``RTE_EVENT_QUEUE_ATTR_AFFINITY`` > > =20 > > +* ** Added function get random floating point number.** > > + > > + Added the function ``rte_drand()`` to provide a pseudo-random > > + floating point number. > > + > > =20 > > Removed Items > > ------------- > > diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c > > index 4535cc980cec..3dc3484ee655 100644 > > --- a/lib/eal/common/rte_random.c > > +++ b/lib/eal/common/rte_random.c > > @@ -6,6 +6,9 @@ > > #include > > #endif > > #include > > +#ifdef RTE_LIBEAL_USE_IEEE754 > > +#include > > +#endif > > =20 > > #include > > #include > > @@ -173,6 +176,44 @@ rte_rand_max(uint64_t upper_bound) > > return res; > > } > > =20 > > +double > > +rte_drand(void) > > +{ > > + struct rte_rand_state *state =3D __rte_rand_get_state(); > > + uint64_t rand64 =3D __rte_rand_lfsr258(state); > > +#ifdef RTE_LIBEAL_USE_IEEE754 > > + union ieee754_double u =3D { > > + .ieee =3D { > > + .negative =3D 0, > > + .exponent =3D IEEE754_DOUBLE_BIAS, > > + }, > > + }; > > + > > + /* Take 64 bit random value and put it into the mantissa > > + * This uses direct access to IEEE format to avoid doing > > + * any direct floating point math here. > > + */ > > + u.ieee.mantissa0 =3D rand64 >> 32; > > + u.ieee.mantissa1 =3D rand64; > > + > > + return u.d - 1.0; > > +#else > > + /* Slower method requiring floating point divide > > + * =20 >=20 > Do you know how much slower? I ran rand_perf_test on two of my systems. >=20 > AMD 5900X Pi4 (ARM Cortex-A72) > IEEE754 version 12 1.19 > Non-IEEE754 version 11 1.16 > Naive version* 24 1.16 >=20 > * (double)rte_rand() / (double)UINT64_MAX >=20 > Numbers are TSC cycles/op. On AMD Ryzen 7 both versions take 9 cycles/op with the rand_perf_autotest So it is a toss up. The 754 version is: ubfx r1, r1, #0, #20 orr r3, r1, #1069547520 << mantissa0 mov r2, r0 orr r3, r3, #3145728 vmov.f64 d0, #1.0e+0 vmov d16, r2, r3 vsub.f64 d0, d16, d0 << return u.d - 1.0 Note: the compiler is doing smart optimization on the divide version. It knows that since denominator is fixed value it can use multiply. vmov d16, r0, r1 vmul.f64 d0, d16, d0