From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 80D8042366; Wed, 11 Oct 2023 18:55:28 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 053ED400EF; Wed, 11 Oct 2023 18:55:28 +0200 (CEST) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id A0E49400D7 for ; Wed, 11 Oct 2023 18:55:26 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id 95CFE206D7; Wed, 11 Oct 2023 18:55:25 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH] eal: add cache guard to per-lcore PRNG state X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Wed, 11 Oct 2023 18:55:21 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9EF2B@smartserver.smartshare.dk> In-Reply-To: <4539298.cEBGB3zze1@thomas> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] eal: add cache guard to per-lcore PRNG state Thread-Index: Adn8XRA5qrVTPRRNQROtFatTBmgbLgABRswQ References: <20230904092632.12675-1-mb@smartsharesystems.com> <86202387-4424-e4d8-64df-531a580bebd4@lysator.liu.se> <20230906092537.609d6462@hermes.local> <4539298.cEBGB3zze1@thomas> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Thomas Monjalon" , =?iso-8859-1?Q?Mattias_R=F6nnblom?= , "Stephen Hemminger" Cc: , , , , , , , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Thomas Monjalon [mailto:thomas@monjalon.net] > Sent: Wednesday, 11 October 2023 18.08 >=20 > TLS is an alternative solution proposed by Stephen. > What do you think? I think we went down a rabbit hole - which I admit to enjoy. :-) My simple patch should be applied, with the description improved by = Mattias: The per-lcore random state is frequently updated by their individual lcores, so add a cache guard to prevent false sharing in case the CPU employs a next-N-lines (or similar) hardware prefetcher. >=20 >=20 > 06/09/2023 18:25, Stephen Hemminger: > > On Mon, 4 Sep 2023 13:57:19 +0200 > > Mattias R=F6nnblom wrote: > > > > > On 2023-09-04 11:26, Morten Br=F8rup wrote: > > > > The per-lcore random state is frequently updated by their > individual > > > > lcores, so add a cache guard to prevent CPU cache thrashing. > > > > > > > > > > "to prevent false sharing in case the CPU employs a next-N-lines = (or > > > similar) hardware prefetcher" > > > > > > In my world, cache trashing and cache line contention are two > different > > > things. > > > > > > Other than that, > > > Acked-by: Mattias R=F6nnblom > > > > Could the per-lcore state be thread local? > > > > Something like this: > > > > From 3df5e28a7e5589d05e1eade62a0979e84697853d Mon Sep 17 00:00:00 = 2001 > > From: Stephen Hemminger > > Date: Wed, 6 Sep 2023 09:22:42 -0700 > > Subject: [PATCH] random: use per lcore state > > > > Move the random number state into thread local storage. > > This has a several benefits. > > - no false cache sharing from cpu prefetching > > - fixes initialization of random state for non-DPDK threads > > - fixes unsafe usage of random state by non-DPDK threads > > > > The initialization of random number state is done by the > > lcore (lazy initialization). > > > > Signed-off-by: Stephen Hemminger > > --- > > lib/eal/common/rte_random.c | 35 = +++++++++++++++++------------------ > > 1 file changed, 17 insertions(+), 18 deletions(-) > > > > diff --git a/lib/eal/common/rte_random.c = b/lib/eal/common/rte_random.c > > index 53636331a27b..62f36038ac52 100644 > > --- a/lib/eal/common/rte_random.c > > +++ b/lib/eal/common/rte_random.c > > @@ -19,13 +19,14 @@ struct rte_rand_state { > > uint64_t z3; > > uint64_t z4; > > uint64_t z5; > > -} __rte_cache_aligned; > > + uint64_t seed; > > +}; > > > > -/* One instance each for every lcore id-equipped thread, and one > > - * additional instance to be shared by all others threads (i.e., = all > > - * unregistered non-EAL threads). > > - */ > > -static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1]; > > +/* Global random seed */ > > +static uint64_t rte_rand_seed; > > + > > +/* Per lcore random state. */ > > +static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state); > > > > static uint32_t > > __rte_rand_lcg32(uint32_t *seed) > > @@ -76,16 +77,14 @@ __rte_srand_lfsr258(uint64_t seed, struct > rte_rand_state *state) > > state->z3 =3D __rte_rand_lfsr258_gen_seed(&lcg_seed, 4096UL); > > state->z4 =3D __rte_rand_lfsr258_gen_seed(&lcg_seed, 131072UL); > > state->z5 =3D __rte_rand_lfsr258_gen_seed(&lcg_seed, 8388608UL); > > + > > + state->seed =3D seed; > > } > > > > void > > rte_srand(uint64_t seed) > > { > > - unsigned int lcore_id; > > - > > - /* add lcore_id to seed to avoid having the same sequence */ > > - for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) > > - __rte_srand_lfsr258(seed + lcore_id, > &rand_states[lcore_id]); > > + __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED); > > } > > > > static __rte_always_inline uint64_t > > @@ -119,15 +118,15 @@ __rte_rand_lfsr258(struct rte_rand_state = *state) > > static __rte_always_inline > > struct rte_rand_state *__rte_rand_get_state(void) > > { > > - unsigned int idx; > > - > > - idx =3D rte_lcore_id(); > > + struct rte_rand_state *rand_state =3D > &RTE_PER_LCORE(rte_rand_state); > > + uint64_t seed; > > > > - /* last instance reserved for unregistered non-EAL threads */ > > - if (unlikely(idx =3D=3D LCORE_ID_ANY)) > > - idx =3D RTE_MAX_LCORE; > > + /* did seed change */ > > + seed =3D __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED); > > + if (unlikely(seed !=3D rand_state->seed)) > > + __rte_srand_lfsr258(seed, rand_state); > > > > - return &rand_states[idx]; > > + return rand_state; > > } > > > > uint64_t > > >=20 >=20 >=20 >=20