From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 18C1043B55; Tue, 20 Feb 2024 17:26:23 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A06D7402A7; Tue, 20 Feb 2024 17:26:22 +0100 (CET) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id DD24940289 for ; Tue, 20 Feb 2024 17:26:20 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 8D0AB11B7C for ; Tue, 20 Feb 2024 17:26:20 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 7FFB111B7B; Tue, 20 Feb 2024 17:26:20 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.4 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 7DCBE11B7A; Tue, 20 Feb 2024 17:26:18 +0100 (CET) Message-ID: <67d41a52-5e45-4906-91f1-57a5906e614c@lysator.liu.se> Date: Tue, 20 Feb 2024 17:26:17 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v3 1/6] eal: add static per-lcore memory allocation facility To: Bruce Richardson Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger References: <20240219094036.485727-2-mattias.ronnblom@ericsson.com> <20240220084908.488252-1-mattias.ronnblom@ericsson.com> <20240220084908.488252-2-mattias.ronnblom@ericsson.com> <68c8f01c-d404-4b63-adca-13b560c95df8@lysator.liu.se> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-02-20 12:39, Bruce Richardson wrote: > On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias Rönnblom wrote: >> On 2024-02-20 10:11, Bruce Richardson wrote: >>> On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote: >>>> Introduce DPDK per-lcore id variables, or lcore variables for short. >>>> >>>> An lcore variable has one value for every current and future lcore >>>> id-equipped thread. >>>> >>>> The primary use case is for statically allocating >>>> small chunks of often-used data, which is related logically, but where >>>> there are performance benefits to reap from having updates being local >>>> to an lcore. >>>> >>>> Lcore variables are similar to thread-local storage (TLS, e.g., C11 >>>> _Thread_local), but decoupling the values' life time with that of the >>>> threads. > > > >>>> +/* >>>> + * Avoid using offset zero, since it would result in a NULL-value >>>> + * "handle" (offset) pointer, which in principle and per the API >>>> + * definition shouldn't be an issue, but may confuse some tools and >>>> + * users. >>>> + */ >>>> +#define INITIAL_OFFSET 1 >>>> + >>>> +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR] __rte_cache_aligned; >>>> + >>> >>> While I like the idea of improved handling for per-core variables, my main >>> concern with this set is this definition here, which adds yet another >>> dependency on the compile-time defined RTE_MAX_LCORE value. >>> >> >> lcore variables replaces one RTE_MAX_LCORE-dependent pattern with another. >> >> You could even argue the dependency on RTE_MAX_LCORE is reduced with lcore >> variables, if you look at where/in how many places in the code base this >> macro is being used. Centralizing per-lcore data management may also provide >> some opportunity in the future for extending the API to cope with some more >> dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course, but we >> are not ever going to change anything related to RTE_MAX_LCORE without >> breaking the ABI, since this constant is everywhere, including compiled into >> the application itself. >> > > Yep, that is true if it's widely used. > >>> I believe we already have an issue with this #define where it's impossible >>> to come up with a single value that works for all, or nearly all cases. The >>> current default is still 128, yet DPDK needs to support systems where the >>> number of cores is well into the hundreds, requiring workarounds of core >>> mappings or customized builds of DPDK. Upping the value fixes those issues >>> at the cost to memory footprint explosion for smaller systems. >>> >> >> I agree this is an issue. >> >> RTE_MAX_LCORE also need to be sized to accommodate not only all cores used, >> but the sum of all EAL threads and registered non-EAL threads. >> >> So, there is no reliable way to discover what RTE_MAX_LCORE is on a >> particular piece of hardware, since the actual number of lcore ids needed is >> up to the application. >> >> Why is the default set so low? Linux has MAX_CPUS, which serves the same >> purpose, which is set to 4096 by default, if I recall correctly. Shouldn't >> we at least be able to increase it to 256? > > The default is so low because of the mempool caches. These are an array of > buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE. > >> >>> I'm therefore nervous about putting more dependencies on this value, when I >>> feel we should be moving away from its use, to allow more runtime >>> configurability of cores. >>> >> >> What more specifically do you have in mind? >> > > I don't think having a dynamically scaling RTE_MAX_LCORE is feasible, but > what I would like to see is a runtime specified value. For example, you > could run DPDK with EAL parameter "--max-lcores=1024" for large systems or > "--max-lcores=32" for small ones. That would then be used at init-time to > scale all internal datastructures appropriately. > Sounds reasonably to me, especially if you would take gradual approach. By gradual I mean something like adding a function rte_lcore_max_possible(), or something like that, returning the EAL init-specified value. DPDK libraries/PMDs could then gradually be made aware and taking advantage of knowing that lcore ids will always be below a certain threshold, usually significantly lower than RTE_MAX_LCORE. The only change required for lcore variables would be that the FOREACH macro would use the run-time-max value, rather than RTE_MAX_LCORE, which in turn would leave all the higher-numbered lcore id buffers untouched/unmapped. The set of possible lcore ids could also be expressed as a bitset, if you have machine with a huge amount of cores, running many small DPDK instances. > /Bruce > >