From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D894F4596A; Thu, 12 Sep 2024 09:05:34 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 954EF40E1C; Thu, 12 Sep 2024 09:05:34 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 1F57140E0B for ; Thu, 12 Sep 2024 09:05:19 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4X47gX2zF2zfc2X; Thu, 12 Sep 2024 15:03:04 +0800 (CST) Received: from dggpeml500024.china.huawei.com (unknown [7.185.36.10]) by mail.maildlp.com (Postfix) with ESMTPS id 5E2C8140453; Thu, 12 Sep 2024 15:05:15 +0800 (CST) Received: from [10.67.121.161] (10.67.121.161) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Sep 2024 15:05:15 +0800 Message-ID: <3a669e9b-9b7d-4fad-996f-2deb3c4a1c5f@huawei.com> Date: Thu, 12 Sep 2024 15:05:14 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/6] eal: add static per-lcore memory allocation facility To: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , CC: =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Konstantin Ananyev , David Marchand References: <20240910070344.699183-2-mattias.ronnblom@ericsson.com> <20240911170430.701685-1-mattias.ronnblom@ericsson.com> <20240911170430.701685-2-mattias.ronnblom@ericsson.com> Content-Language: en-US From: fengchengwen In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.121.161] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500024.china.huawei.com (7.185.36.10) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024/9/12 13:35, Mattias Rönnblom wrote: > On 2024-09-12 04:33, fengchengwen wrote: >> On 2024/9/12 1:04, Mattias Rönnblom wrote: >>> Introduce DPDK per-lcore id variables, or lcore variables for short. >>> >>> An lcore variable has one value for every current and future lcore >>> id-equipped thread. >>> >>> The primary use case is for statically allocating >>> small, frequently-accessed data structures, for which one instance >>> should exist for each lcore. >>> >>> Lcore variables are similar to thread-local storage (TLS, e.g., C11 >>> _Thread_local), but decoupling the values' life time with that of the >>> threads. >>> >>> Lcore variables are also similar in terms of functionality provided by >>> FreeBSD kernel's DPCPU_*() family of macros and the associated >>> build-time machinery. DPCPU uses linker scripts, which effectively >>> prevents the reuse of its, otherwise seemingly viable, approach. >>> >>> The currently-prevailing way to solve the same problem as lcore >>> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized >>> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of >>> lcore variables over this approach is that data related to the same >>> lcore now is close (spatially, in memory), rather than data used by >>> the same module, which in turn avoid excessive use of padding, >>> polluting caches with unused data. >>> >>> Signed-off-by: Mattias Rönnblom >>> Acked-by: Morten Brørup >>> >>> -- >>> >>> PATCH v2: >>>   * Add Windows support. (Morten Brørup) >>>   * Fix lcore variables API index reference. (Morten Brørup) >>>   * Various improvements of the API documentation. (Morten Brørup) >>>   * Elimination of unused symbol in version.map. (Morten Brørup) >> >> these history could move to the cover letter. >> >>> >>> PATCH: >>>   * Update MAINTAINERS and release notes. >>>   * Stop covering included files in extern "C" {}. >>> >>> RFC v6: >>>   * Include to get aligned_alloc(). >>>   * Tweak documentation (grammar). >>>   * Provide API-level guarantees that lcore variable values take on an >>>     initial value of zero. >>>   * Fix misplaced __rte_cache_aligned in the API doc example. >>> >>> RFC v5: >>>   * In Doxygen, consistenly use @ (and not \). >>>   * The RTE_LCORE_VAR_GET() and SET() convience access macros >>>     covered an uncommon use case, where the lcore value is of a >>>     primitive type, rather than a struct, and is thus eliminated >>>     from the API. (Morten Brørup) >>>   * In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR() >>>     RTE_LCORE_VAR_VALUE(). >>>   * The underscores are removed from __rte_lcore_var_lcore_ptr() to >>>     signal that this function is a part of the public API. >>>   * Macro arguments are documented. >>> >>> RFV v4: >>>   * Replace large static array with libc heap-allocated memory. One >>>     implication of this change is there no longer exists a fixed upper >>>     bound for the total amount of memory used by lcore variables. >>>     RTE_MAX_LCORE_VAR has changed meaning, and now represent the >>>     maximum size of any individual lcore variable value. >>>   * Fix issues in example. (Morten Brørup) >>>   * Improve access macro type checking. (Morten Brørup) >>>   * Refer to the lcore variable handle as "handle" and not "name" in >>>     various macros. >>>   * Document lack of thread safety in rte_lcore_var_alloc(). >>>   * Provide API-level assurance the lcore variable handle is >>>     always non-NULL, to all applications to use NULL to mean >>>     "not yet allocated". >>>   * Note zero-sized allocations are not allowed. >>>   * Give API-level guarantee the lcore variable values are zeroed. >>> >>> RFC v3: >>>   * Replace use of GCC-specific alignof() with alignof(). >>>   * Update example to reflect FOREACH macro name change (in RFC v2). >>> >>> RFC v2: >>>   * Use alignof to derive alignment requirements. (Morten Brørup) >>>   * Change name of FOREACH to make it distinct from 's >>>     *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup) >>>   * Allow user-specified alignment, but limit max to cache line size. >>> --- >>>   MAINTAINERS                            |   6 + >>>   config/rte_config.h                    |   1 + >>>   doc/api/doxy-api-index.md              |   1 + >>>   doc/guides/rel_notes/release_24_11.rst |  14 + >>>   lib/eal/common/eal_common_lcore_var.c  |  78 +++++ >>>   lib/eal/common/meson.build             |   1 + >>>   lib/eal/include/meson.build            |   1 + >>>   lib/eal/include/rte_lcore_var.h        | 385 +++++++++++++++++++++++++ >>>   lib/eal/version.map                    |   2 + >>>   9 files changed, 489 insertions(+) >>>   create mode 100644 lib/eal/common/eal_common_lcore_var.c >>>   create mode 100644 lib/eal/include/rte_lcore_var.h >>> >>> diff --git a/MAINTAINERS b/MAINTAINERS >>> index c5a703b5c0..362d9a3f28 100644 >>> --- a/MAINTAINERS >>> +++ b/MAINTAINERS >>> @@ -282,6 +282,12 @@ F: lib/eal/include/rte_random.h >>>   F: lib/eal/common/rte_random.c >>>   F: app/test/test_rand_perf.c >>>   +Lcore Variables >>> +M: Mattias Rönnblom >>> +F: lib/eal/include/rte_lcore_var.h >>> +F: lib/eal/common/eal_common_lcore_var.c >>> +F: app/test/test_lcore_var.c >>> + >>>   ARM v7 >>>   M: Wathsala Vithanage >>>   F: config/arm/ >>> diff --git a/config/rte_config.h b/config/rte_config.h >>> index dd7bb0d35b..311692e498 100644 >>> --- a/config/rte_config.h >>> +++ b/config/rte_config.h >>> @@ -41,6 +41,7 @@ >>>   /* EAL defines */ >>>   #define RTE_CACHE_GUARD_LINES 1 >>>   #define RTE_MAX_HEAPS 32 >>> +#define RTE_MAX_LCORE_VAR 1048576 >>>   #define RTE_MAX_MEMSEG_LISTS 128 >>>   #define RTE_MAX_MEMSEG_PER_LIST 8192 >>>   #define RTE_MAX_MEM_MB_PER_LIST 32768 >>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md >>> index f9f0300126..ed577f14ee 100644 >>> --- a/doc/api/doxy-api-index.md >>> +++ b/doc/api/doxy-api-index.md >>> @@ -99,6 +99,7 @@ The public API headers are grouped by topics: >>>     [interrupts](@ref rte_interrupts.h), >>>     [launch](@ref rte_launch.h), >>>     [lcore](@ref rte_lcore.h), >>> +  [lcore variables](@ref rte_lcore_var.h), >>>     [per-lcore](@ref rte_per_lcore.h), >>>     [service cores](@ref rte_service.h), >>>     [keepalive](@ref rte_keepalive.h), >>> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst >>> index 0ff70d9057..a3884f7491 100644 >>> --- a/doc/guides/rel_notes/release_24_11.rst >>> +++ b/doc/guides/rel_notes/release_24_11.rst >>> @@ -55,6 +55,20 @@ New Features >>>        Also, make sure to start the actual text at the margin. >>>        ======================================================= >>>   +* **Added EAL per-lcore static memory allocation facility.** >>> + >>> +    Added EAL API for statically allocating small, >>> +    frequently-accessed data structures, for which one instance should >>> +    exist for each EAL thread and registered non-EAL thread. >>> + >>> +    With lcore variables, data is organized spatially on a per-lcore id >>> +    basis, rather than per library or PMD, avoiding the need for cache >>> +    aligning (or RTE_CACHE_GUARDing) data structures, which in turn >>> +    reduces CPU cache internal fragmentation, improving performance. >>> + >>> +    Lcore variables are similar to thread-local storage (TLS, e.g., >>> +    C11 _Thread_local), but decoupling the values' life time from that >>> +    of the threads. >>>     Removed Items >>>   ------------- >>> diff --git a/lib/eal/common/eal_common_lcore_var.c b/lib/eal/common/eal_common_lcore_var.c >>> new file mode 100644 >>> index 0000000000..309822039b >>> --- /dev/null >>> +++ b/lib/eal/common/eal_common_lcore_var.c >>> @@ -0,0 +1,78 @@ >>> +/* SPDX-License-Identifier: BSD-3-Clause >>> + * Copyright(c) 2024 Ericsson AB >>> + */ >>> + >>> +#include >>> +#include >>> + >>> +#ifdef RTE_EXEC_ENV_WINDOWS >>> +#include >>> +#endif >>> + >>> +#include >>> +#include >>> +#include >>> + >>> +#include >>> + >>> +#include "eal_private.h" >>> + >>> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE) >>> + >>> +static void *lcore_buffer; >>> +static size_t offset = RTE_MAX_LCORE_VAR; >>> + >>> +static void * >>> +lcore_var_alloc(size_t size, size_t align) >>> +{ >>> +    void *handle; >>> +    void *value; >>> + >>> +    offset = RTE_ALIGN_CEIL(offset, align); >>> + >>> +    if (offset + size > RTE_MAX_LCORE_VAR) { >>> +#ifdef RTE_EXEC_ENV_WINDOWS >>> +        lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE, >>> +                           RTE_CACHE_LINE_SIZE); >>> +#else >>> +        lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE, >>> +                         LCORE_BUFFER_SIZE); >>> +#endif >>> +        RTE_VERIFY(lcore_buffer != NULL); >>> + >>> +        offset = 0; >>> +    } >>> + >>> +    handle = RTE_PTR_ADD(lcore_buffer, offset); >>> + >>> +    offset += size; >>> + >>> +    RTE_LCORE_VAR_FOREACH_VALUE(value, handle) >>> +        memset(value, 0, size); >>> + >>> +    EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a " >>> +        "%"PRIuPTR"-byte alignment", size, align); >> >> Currrent the data was malloc by libc function, I think it's mainly for such INIT macro which will be init before main. >> But it will introduce following problem: >> 1\ it can't benefit from huge-pages. this patch may reserved many 1MBs for each lcore, if we could place it in huge-pages it will reduce the TLB miss rate, especially it freq access data. > > This mechanism is for small allocations, which the sum of is also expected to be small (although the system won't break if they aren't). > > If you have large allocations, you are better off using lazy huge page allocations further down the initialization process. Otherwise, you will end up using memory for RTE_MAX_LCORE instances, rather than the actual lcore count, which could be substantially smaller. Yes, it may cost two much memory if allocated from hugepage memory. > > But sure, everything else being equal, you could have used huge pages for these lcore variable values. But everything isn't equal. > >> 2\ it can't across multi-process. many of current lcore-data also don't support multi-process, but I think it worth do that, and it will help us to some service recovery when sub-process failed and reboot. >> >> ... >> > > Not sure I think that's a downside. Further cementing that anti-pattern into DPDK seems to be a bad idea to me. > > lcore variables doesn't *introduce* any of these issues, since the mechanisms it's replacing also have these shortcomings (if you think about them as such - I'm not sure I do). Got it. This feature is a enhanced for current lcore variables, which bring together scattered data from the point view of a single core. and current it seemmed hard to extend support hugepage memory.