From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7EAC442567; Sun, 10 Sep 2023 15:26:31 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 619AC40267; Sun, 10 Sep 2023 15:26:29 +0200 (CEST) Received: from forward501b.mail.yandex.net (forward501b.mail.yandex.net [178.154.239.145]) by mails.dpdk.org (Postfix) with ESMTP id 62A3A40156 for ; Sun, 10 Sep 2023 15:26:27 +0200 (CEST) Received: from mail-nwsmtp-smtp-production-main-45.myt.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-45.myt.yp-c.yandex.net [IPv6:2a02:6b8:c12:300c:0:640:6660:0]) by forward501b.mail.yandex.net (Yandex) with ESMTP id A304C5E9D4; Sun, 10 Sep 2023 16:26:26 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-45.myt.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id NQc8YiVDfW20-MRz0J4wB; Sun, 10 Sep 2023 16:26:25 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1694352386; bh=bgeAEclWDsEaKPUSoZ3rR+a6XdVcVFtx65ii1hvZIFk=; h=From:In-Reply-To:Cc:Date:References:To:Subject:Message-ID; b=hNxvVluSIvkowPU0Kx4oJp7iBOiUBbJDqLJcDt4J42T67/X4nvrJePit3dKlZBxqM kdv4VCPpa+nXqyHoa1dV0mKCD6GAssMmVlB4WLnkmgAnI3OfxTCydk+KfT4XW2SasU w/q7R4AdGwMKg/JvHZE6zegmLJS+DMHGpRCtJQLk= Authentication-Results: mail-nwsmtp-smtp-production-main-45.myt.yp-c.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <4dca5b6d-1f67-dc97-4afe-1fc82a1d6452@yandex.ru> Date: Sun, 10 Sep 2023 14:26:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Subject: Re: [RFC] random: use per lcore state Content-Language: en-US To: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= , Stephen Hemminger , dev@dpdk.org Cc: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= , =?UTF-8?Q?Morten_Br=c3=b8rup?= References: <20230906172013.169846-1-stephen@networkplumber.org> <741c94a4-65f9-c044-b3fc-3b8c28922d48@yandex.ru> <4d00a500-a054-b11b-6135-49e4ef7965f2@lysator.liu.se> From: Konstantin Ananyev In-Reply-To: <4d00a500-a054-b11b-6135-49e4ef7965f2@lysator.liu.se> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 09/09/2023 07:45, Mattias Rönnblom пишет: > On 2023-09-09 02:13, Konstantin Ananyev wrote: >> 06/09/2023 21:02, Mattias Rönnblom пишет: >>> On 2023-09-06 19:20, Stephen Hemminger wrote: >>>> Move the random number state into thread local storage. >>> >>> Me and Morten discussed TLS versus other alternatives in some other >>> thread. The downside of TLS that Morten pointed out, from what I >>> recall, is that lazy initialization is *required* (since the number >>> of threads is open-ended), and the data ends up in non-huge page memory. >> >> Hmm.. correct me if I am wrong, but with current implementation, >> rand state is also in non-huge memory: >> static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1]; >> > > Yes. The current pattern is certainly not perfect. > >> >>> It was also unclear to me what the memory footprint implications >>> would be,h would large per-lcore data structures be put in TLS. More >>> specifically, if they would be duplicated across all threads, even >>> non-lcore threads. >>> >>> None of these issues affect rte_random.c's potential usage of TLS >>> (except lazy [re-]initialization makes things more complicated). >>> >>> Preferably, there should be one pattern that is usable across all or >>> at least most DPDK modules requiring per-lcore state. >>> >>>> This has a several benefits. >>>>   - no false cache sharing from cpu prefetching >>>>   - fixes initialization of random state for non-DPDK threads >>> >>> This seems like a non-reason to me. That bug is easily fixed, if it >>> isn't already. >>> >>>>   - fixes unsafe usage of random state by non-DPDK threads >>>> >>> >>> "Makes random number generation MT safe from all threads (including >>> unregistered non-EAL threads)." >>> >>> With current API semantics you may still register an non-EAL thread, >>> to get MT safe access to this API, so I guess it's more about being >>> more convenient and less error prone, than anything else. >> >> I understand that we never guaranteed MT safety for non-EAL threads here, > > > Registered non-EAL threads have a lcore id and thus may safely call > rte_rand(). I am aware about such ability, but for me register/unregister thread just to call rte_rand() seems like way too much hassle. > Multiple unregistered non-EAL threads may not do so, in > parallel. > > >> but as a user of rte_rand() - it would be much more convenient, if I >> can use it >> from any thread wthout worring is it a EAL thread or not. > > Sure, especially if it comes for free. The for-free solution has yet to > reveal itself though. > >> >> About TlS usage and re-seeding - can we use some sort of middle-ground: >> extend rte_rand_state with some gen-counter. >> Make a 'master' copy of rte_rand_state that will be updated by >> rte_srand(), >> and TLS copies of rte_rand_state, so rte_rand() can fist compare >> its gen-counter value with master copy to decide, >> does it need to copy new state from master or not. >> > > Calling threads shouldn't all produce the same sequence. That would be > silly and not very random. The generation number should be tied to the > seed. Actually, yes you right, probably we don't need a master copy of rte_rand_state itself. It seems that just having a 'master' copy of 'seed' value, plus some counter (to indicate that seed has been updated) is enough here. > >> >>> The new MT safety guarantees should be in the API docs as well. >> >> Yes, it is an extension to the current API, not a fix. >> >>> >>>> The initialization of random number state is done by the >>>> lcore (lazy initialization). >>>> >>>> Signed-off-by: Stephen Hemminger >>>> --- >>>>   lib/eal/common/rte_random.c | 38 >>>> +++++++++++++++++++------------------ >>>>   1 file changed, 20 insertions(+), 18 deletions(-) >>>> >>>> diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c >>>> index 53636331a27b..9657adf6ad3b 100644 >>>> --- a/lib/eal/common/rte_random.c >>>> +++ b/lib/eal/common/rte_random.c >>>> @@ -19,13 +19,14 @@ struct rte_rand_state { >>>>       uint64_t z3; >>>>       uint64_t z4; >>>>       uint64_t z5; >>>> -} __rte_cache_aligned; >>>> +    uint64_t seed; >>>> +}; >>>> -/* One instance each for every lcore id-equipped thread, and one >>>> - * additional instance to be shared by all others threads (i.e., all >>>> - * unregistered non-EAL threads). >>>> - */ >>>> -static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1]; >>>> +/* Global random seed */ >>>> +static uint64_t rte_rand_seed; >>>> + >>>> +/* Per lcore random state. */ >>>> +static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state); >>>>   static uint32_t >>>>   __rte_rand_lcg32(uint32_t *seed) >>>> @@ -81,11 +82,7 @@ __rte_srand_lfsr258(uint64_t seed, struct >>>> rte_rand_state *state) >>>>   void >>>>   rte_srand(uint64_t seed) >>>>   { >>>> -    unsigned int lcore_id; >>>> - >>>> -    /* add lcore_id to seed to avoid having the same sequence */ >>>> -    for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) >>>> -        __rte_srand_lfsr258(seed + lcore_id, &rand_states[lcore_id]); >>>> +    __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED); >>>>   } >>>>   static __rte_always_inline uint64_t >>>> @@ -119,15 +116,18 @@ __rte_rand_lfsr258(struct rte_rand_state *state) >>>>   static __rte_always_inline >>>>   struct rte_rand_state *__rte_rand_get_state(void) >>>>   { >>>> -    unsigned int idx; >>>> +    struct rte_rand_state *rand_state = >>>> &RTE_PER_LCORE(rte_rand_state); >>> >>> There should really be a RTE_PER_THREAD, an alias to RTE_PER_LCORE, >>> to cover this usage. Or just use __thread (or _Thread_local?). >>> >>>> +    uint64_t seed; >>>> -    idx = rte_lcore_id(); >>>> +    seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED); >>>> +    if (unlikely(seed != rand_state->seed)) { >>>> +        rand_state->seed = seed; >>> >>> Re-seeding should restart the series, on all lcores. There's nothing >>> preventing the user from re-seeding the machinery repeatedly, with >>> the same seed. Seems like an unusual, but still valid, use case, if >>> you run repeated tests of some sort. >>> >>> Use a seqlock? :) I guess you need a seed generation number as well >>> (e.g., is this the first time you seed with X, or the second one, etc.) >>> >>>> -    /* last instance reserved for unregistered non-EAL threads */ >>>> -    if (unlikely(idx == LCORE_ID_ANY)) >>>> -        idx = RTE_MAX_LCORE; >>>> +        seed += rte_thread_self().opaque_id; >>>> +        __rte_srand_lfsr258(seed, rand_state); >>>> +    } >>>> -    return &rand_states[idx]; >>>> +    return rand_state; >>>>   } >>>>   uint64_t >>>> @@ -227,7 +227,9 @@ RTE_INIT(rte_rand_init) >>>>   { >>>>       uint64_t seed; >>>> -    seed = __rte_random_initial_seed(); >>>> +    do >>>> +        seed = __rte_random_initial_seed(); >>>> +    while (seed == 0); >>> >>> Might be worth a comment why seed 0 is not allowed. Alternatively, >>> use some other way of signaling __rte_srand_lfsr258() must be called. >>> >>>>       rte_srand(seed); >>>>   } >>