From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 649CA45967; Thu, 12 Sep 2024 07:35:33 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EED0F4025E; Thu, 12 Sep 2024 07:35:32 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 7306C40144 for ; Thu, 12 Sep 2024 07:35:31 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id AE50133F3 for ; Thu, 12 Sep 2024 07:35:30 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id A2BC133F2; Thu, 12 Sep 2024 07:35:30 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.86] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id E9F2B33F0; Thu, 12 Sep 2024 07:35:27 +0200 (CEST) Message-ID: Date: Thu, 12 Sep 2024 07:35:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/6] eal: add static per-lcore memory allocation facility To: fengchengwen , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org Cc: =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Konstantin Ananyev , David Marchand References: <20240910070344.699183-2-mattias.ronnblom@ericsson.com> <20240911170430.701685-1-mattias.ronnblom@ericsson.com> <20240911170430.701685-2-mattias.ronnblom@ericsson.com> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-09-12 04:33, fengchengwen wrote: > On 2024/9/12 1:04, Mattias Rönnblom wrote: >> Introduce DPDK per-lcore id variables, or lcore variables for short. >> >> An lcore variable has one value for every current and future lcore >> id-equipped thread. >> >> The primary use case is for statically allocating >> small, frequently-accessed data structures, for which one instance >> should exist for each lcore. >> >> Lcore variables are similar to thread-local storage (TLS, e.g., C11 >> _Thread_local), but decoupling the values' life time with that of the >> threads. >> >> Lcore variables are also similar in terms of functionality provided by >> FreeBSD kernel's DPCPU_*() family of macros and the associated >> build-time machinery. DPCPU uses linker scripts, which effectively >> prevents the reuse of its, otherwise seemingly viable, approach. >> >> The currently-prevailing way to solve the same problem as lcore >> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized >> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of >> lcore variables over this approach is that data related to the same >> lcore now is close (spatially, in memory), rather than data used by >> the same module, which in turn avoid excessive use of padding, >> polluting caches with unused data. >> >> Signed-off-by: Mattias Rönnblom >> Acked-by: Morten Brørup >> >> -- >> >> PATCH v2: >> * Add Windows support. (Morten Brørup) >> * Fix lcore variables API index reference. (Morten Brørup) >> * Various improvements of the API documentation. (Morten Brørup) >> * Elimination of unused symbol in version.map. (Morten Brørup) > > these history could move to the cover letter. > >> >> PATCH: >> * Update MAINTAINERS and release notes. >> * Stop covering included files in extern "C" {}. >> >> RFC v6: >> * Include to get aligned_alloc(). >> * Tweak documentation (grammar). >> * Provide API-level guarantees that lcore variable values take on an >> initial value of zero. >> * Fix misplaced __rte_cache_aligned in the API doc example. >> >> RFC v5: >> * In Doxygen, consistenly use @ (and not \). >> * The RTE_LCORE_VAR_GET() and SET() convience access macros >> covered an uncommon use case, where the lcore value is of a >> primitive type, rather than a struct, and is thus eliminated >> from the API. (Morten Brørup) >> * In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR() >> RTE_LCORE_VAR_VALUE(). >> * The underscores are removed from __rte_lcore_var_lcore_ptr() to >> signal that this function is a part of the public API. >> * Macro arguments are documented. >> >> RFV v4: >> * Replace large static array with libc heap-allocated memory. One >> implication of this change is there no longer exists a fixed upper >> bound for the total amount of memory used by lcore variables. >> RTE_MAX_LCORE_VAR has changed meaning, and now represent the >> maximum size of any individual lcore variable value. >> * Fix issues in example. (Morten Brørup) >> * Improve access macro type checking. (Morten Brørup) >> * Refer to the lcore variable handle as "handle" and not "name" in >> various macros. >> * Document lack of thread safety in rte_lcore_var_alloc(). >> * Provide API-level assurance the lcore variable handle is >> always non-NULL, to all applications to use NULL to mean >> "not yet allocated". >> * Note zero-sized allocations are not allowed. >> * Give API-level guarantee the lcore variable values are zeroed. >> >> RFC v3: >> * Replace use of GCC-specific alignof() with alignof(). >> * Update example to reflect FOREACH macro name change (in RFC v2). >> >> RFC v2: >> * Use alignof to derive alignment requirements. (Morten Brørup) >> * Change name of FOREACH to make it distinct from 's >> *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup) >> * Allow user-specified alignment, but limit max to cache line size. >> --- >> MAINTAINERS | 6 + >> config/rte_config.h | 1 + >> doc/api/doxy-api-index.md | 1 + >> doc/guides/rel_notes/release_24_11.rst | 14 + >> lib/eal/common/eal_common_lcore_var.c | 78 +++++ >> lib/eal/common/meson.build | 1 + >> lib/eal/include/meson.build | 1 + >> lib/eal/include/rte_lcore_var.h | 385 +++++++++++++++++++++++++ >> lib/eal/version.map | 2 + >> 9 files changed, 489 insertions(+) >> create mode 100644 lib/eal/common/eal_common_lcore_var.c >> create mode 100644 lib/eal/include/rte_lcore_var.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index c5a703b5c0..362d9a3f28 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -282,6 +282,12 @@ F: lib/eal/include/rte_random.h >> F: lib/eal/common/rte_random.c >> F: app/test/test_rand_perf.c >> >> +Lcore Variables >> +M: Mattias Rönnblom >> +F: lib/eal/include/rte_lcore_var.h >> +F: lib/eal/common/eal_common_lcore_var.c >> +F: app/test/test_lcore_var.c >> + >> ARM v7 >> M: Wathsala Vithanage >> F: config/arm/ >> diff --git a/config/rte_config.h b/config/rte_config.h >> index dd7bb0d35b..311692e498 100644 >> --- a/config/rte_config.h >> +++ b/config/rte_config.h >> @@ -41,6 +41,7 @@ >> /* EAL defines */ >> #define RTE_CACHE_GUARD_LINES 1 >> #define RTE_MAX_HEAPS 32 >> +#define RTE_MAX_LCORE_VAR 1048576 >> #define RTE_MAX_MEMSEG_LISTS 128 >> #define RTE_MAX_MEMSEG_PER_LIST 8192 >> #define RTE_MAX_MEM_MB_PER_LIST 32768 >> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md >> index f9f0300126..ed577f14ee 100644 >> --- a/doc/api/doxy-api-index.md >> +++ b/doc/api/doxy-api-index.md >> @@ -99,6 +99,7 @@ The public API headers are grouped by topics: >> [interrupts](@ref rte_interrupts.h), >> [launch](@ref rte_launch.h), >> [lcore](@ref rte_lcore.h), >> + [lcore variables](@ref rte_lcore_var.h), >> [per-lcore](@ref rte_per_lcore.h), >> [service cores](@ref rte_service.h), >> [keepalive](@ref rte_keepalive.h), >> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst >> index 0ff70d9057..a3884f7491 100644 >> --- a/doc/guides/rel_notes/release_24_11.rst >> +++ b/doc/guides/rel_notes/release_24_11.rst >> @@ -55,6 +55,20 @@ New Features >> Also, make sure to start the actual text at the margin. >> ======================================================= >> >> +* **Added EAL per-lcore static memory allocation facility.** >> + >> + Added EAL API for statically allocating small, >> + frequently-accessed data structures, for which one instance should >> + exist for each EAL thread and registered non-EAL thread. >> + >> + With lcore variables, data is organized spatially on a per-lcore id >> + basis, rather than per library or PMD, avoiding the need for cache >> + aligning (or RTE_CACHE_GUARDing) data structures, which in turn >> + reduces CPU cache internal fragmentation, improving performance. >> + >> + Lcore variables are similar to thread-local storage (TLS, e.g., >> + C11 _Thread_local), but decoupling the values' life time from that >> + of the threads. >> >> Removed Items >> ------------- >> diff --git a/lib/eal/common/eal_common_lcore_var.c b/lib/eal/common/eal_common_lcore_var.c >> new file mode 100644 >> index 0000000000..309822039b >> --- /dev/null >> +++ b/lib/eal/common/eal_common_lcore_var.c >> @@ -0,0 +1,78 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2024 Ericsson AB >> + */ >> + >> +#include >> +#include >> + >> +#ifdef RTE_EXEC_ENV_WINDOWS >> +#include >> +#endif >> + >> +#include >> +#include >> +#include >> + >> +#include >> + >> +#include "eal_private.h" >> + >> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE) >> + >> +static void *lcore_buffer; >> +static size_t offset = RTE_MAX_LCORE_VAR; >> + >> +static void * >> +lcore_var_alloc(size_t size, size_t align) >> +{ >> + void *handle; >> + void *value; >> + >> + offset = RTE_ALIGN_CEIL(offset, align); >> + >> + if (offset + size > RTE_MAX_LCORE_VAR) { >> +#ifdef RTE_EXEC_ENV_WINDOWS >> + lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE, >> + RTE_CACHE_LINE_SIZE); >> +#else >> + lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE, >> + LCORE_BUFFER_SIZE); >> +#endif >> + RTE_VERIFY(lcore_buffer != NULL); >> + >> + offset = 0; >> + } >> + >> + handle = RTE_PTR_ADD(lcore_buffer, offset); >> + >> + offset += size; >> + >> + RTE_LCORE_VAR_FOREACH_VALUE(value, handle) >> + memset(value, 0, size); >> + >> + EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a " >> + "%"PRIuPTR"-byte alignment", size, align); > > Currrent the data was malloc by libc function, I think it's mainly for such INIT macro which will be init before main. > But it will introduce following problem: > 1\ it can't benefit from huge-pages. this patch may reserved many 1MBs for each lcore, if we could place it in huge-pages it will reduce the TLB miss rate, especially it freq access data. This mechanism is for small allocations, which the sum of is also expected to be small (although the system won't break if they aren't). If you have large allocations, you are better off using lazy huge page allocations further down the initialization process. Otherwise, you will end up using memory for RTE_MAX_LCORE instances, rather than the actual lcore count, which could be substantially smaller. But sure, everything else being equal, you could have used huge pages for these lcore variable values. But everything isn't equal. > 2\ it can't across multi-process. many of current lcore-data also don't support multi-process, but I think it worth do that, and it will help us to some service recovery when sub-process failed and reboot. > > ... > Not sure I think that's a downside. Further cementing that anti-pattern into DPDK seems to be a bad idea to me. lcore variables doesn't *introduce* any of these issues, since the mechanisms it's replacing also have these shortcomings (if you think about them as such - I'm not sure I do).