From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D148245B12; Fri, 11 Oct 2024 10:04:21 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B4A2E4028E; Fri, 11 Oct 2024 10:04:21 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 393AC400D5 for ; Fri, 11 Oct 2024 10:04:20 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id BFF512E23 for ; Fri, 11 Oct 2024 10:04:19 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id B3A572D9C; Fri, 11 Oct 2024 10:04:19 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.85] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id DAEF42D99; Fri, 11 Oct 2024 10:04:14 +0200 (CEST) Message-ID: <5818e22a-1e22-4533-85b8-fb9d00c834da@lysator.liu.se> Date: Fri, 11 Oct 2024 10:04:13 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Subject: Re: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility To: Thomas Monjalon , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob , Luka Jankovic , Konstantin Ananyev , Chengwen Feng References: <20241010141349.813088-2-mattias.ronnblom@ericsson.com> <20241010142205.813134-1-mattias.ronnblom@ericsson.com> <20241010142205.813134-2-mattias.ronnblom@ericsson.com> <1829355.yIU609i1g2@thomas> Content-Language: en-US In-Reply-To: <1829355.yIU609i1g2@thomas> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-10-10 23:24, Thomas Monjalon wrote: > Hello, > > This new feature looks to bring something interesting to DPDK. > There was a good amount of discussion and review, > and there is a real effort of documentation. > > However, some choices done in this implementation > were not explained or advertised enough in the documentation, > in my opinion. > Are those of relevance to the API user? > I think the first thing to add is an explanation of the memory layout. > Maybe that a SVG drawing would help to show how it is stored. > That would be helpful to someone wanting to understand the internals. But where should that go? If it's put in the API, it will also obscure the *actual* API documentation. I have some drawings already, and I agree they are very helpful - both in explaining how things work, and making obvious why the memory layout resulting from the use of lcore variables are superior to that of the lcore id-index static array approach. > We also need to explain why it is not using rte_malloc. > > Also please could you re-read the doc and comments in detail? > I think some words are missing and there are typos. > While at it, please allow for easy update of the text > by starting each sentence on a new line. > Breaking lines logically is better for future patches. > One more advice: avoid very long sentences. > I've gone through the documentation and will post a new patch set. There's been a lot of comments and discussion on this patch set. Did you have anything in particular in mind? > Do you have benchmarks results of the modules using such variables > (power, random, service)? > It would be interesting to compare time efficiency and memory usage > before/after, with different number of threads. > I have the dummy modules of test_lcore_var_perf.c, which show the performance benefits as the number of modules using lcore variables increases. That said, the gains are hard to quantify with micro benchmarks, and for real-world performance, one really has to start using the facility at scale before anything interesting may happen. Keep in mind however, that while this is new to DPDK, similar facilities already exists your favorite UN*X kernel. The implementation is different, but I think it's accurate to say the goal and the effects should be the same. One can also run the perf autotest for RTE random, but such tests only show lcore variables doesn't make things significantly worse when the L1 cache is essentially unused. (In fact, the lcore variable-enabled rte_random.c somewhat counter-intuitively generates a 64-bit number 1 TSC cycle faster than the old version on my system.) Just to be clear: it's the footprint in the core-private caches we are attempting to reduce. > Adding more detailed comments below. > > > 10/10/2024 16:21, Mattias Rönnblom: >> Introduce DPDK per-lcore id variables, or lcore variables for short. >> >> An lcore variable has one value for every current and future lcore >> id-equipped thread. > > I find it difficult to read "lcore id-equipped thread". > Can we just say "DPDK thread"? > Sure, if you point me to a definition of what a DPDK thread is. I can think of at least four potential definitions * An EAL thread * An EAL thread or a registered non-EAL thread * Any thread calling into DPDK APIs * Any thread living in a DPDK process >> The primary use case is for statically allocating >> small, frequently-accessed data structures, for which one instance >> should exist for each lcore. >> >> Lcore variables are similar to thread-local storage (TLS, e.g., C11 >> _Thread_local), but decoupling the values' life time with that of the >> threads. > > In which situation we need values of a dead thread? > To clean up heap-allocated memory referenced by such variables, for example, or other resources. > [...] >> +An application, a DPDK library or PMD may keep opt to keep per-thread >> +state. > > I don't understand this sentence. > Which part is unclear? >> + >> +Per-thread data may be maintained using either *lcore variables* >> +(``rte_lcore_var.h``), *thread-local storage (TLS)* >> +(``rte_per_lcore.h``), or a static array of ``RTE_MAX_LCORE`` >> +elements, index by ``rte_lcore_id()``. These methods allows for > > index*ed* > Fixed. >> +per-lcore data to be a largely module-internal affair, and not >> +directly visible in its API. Another possibility is to have deal > > *to* deal ? > >> +explicitly with per-thread aspects in the API (e.g., the ports of the >> +Eventdev API). >> + >> +Lcore varibles are suitable for small object statically allocated at > > vari*a*bles > Fixed. >> +the time of module or application initialization. An lcore variable >> +take on one value for each lcore id-equipped thread (i.e., for EAL >> +threads and registered non-EAL threads, in total ``RTE_MAX_LCORE`` >> +instances). The lifetime of lcore variables are detached from that of >> +the owning threads, and may thus be initialized prior to the owner >> +having been created. >> + >> +Variables with thread-local storage are allocated at the time of >> +thread creation, and exists until the thread terminates, for every >> +thread in the process. Only very small object should be allocated in >> +TLS, since large TLS objects significantly slows down thread creation >> +and may needlessly increase memory footprint for application that make >> +extensive use of unregistered threads. > > I don't understand the relation with non-DPDK threads. > __thread isn't just for "DPDK threads". It will allocate memory on all threads in the process. >> + >> +A common but now largely obsolete DPDK pattern is to use a static >> +array sized according to the maximum number of lcore id-equipped >> +threads (i.e., with ``RTE_MAX_LCORE`` elements). To avoid *false >> +sharing*, each element must both cache-aligned, and include a > > must *be* Fixed. > include*s* > No, it's "include". >> +``RTE_CACHE_GUARD``. Such extensive use of padding cause internal > > cause*s* > Fixed. >> +fragmentation (i.e., unused space) and lower cache hit rates. >> + >> +For more discussions on per-lcore state, see the ``rte_lcore_var.h`` >> +API documentation. > > [...] >> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE) > > With #define RTE_MAX_LCORE_VAR 1048576, > LCORE_BUFFER_SIZE can be 100MB, right? > Sure. Unless you mlock the memory, it won't result in the DPDK process having 100MB worth of mostly-unused resident memory (RSS, in Linux speak). It would, would we use huge pages and thus effectively disabled demand paging. This is similar to how thread stacks generally work, where you often get a fairly sizable stack (e.g., 2MB) but as long as you don't use all of it, most of the pages won't be resident. If you want to guard against such mlocked scenarios, you could consider lowering the max variable size. You could argue it's strange to have a large RTE_MAX_LCORE_VAR and yet tell the API user to only use it for small, often-used block of memory. If RTE_MAX_LCORE_VAR should have a different value, what should it be? >> + >> +static void *lcore_buffer; > > It is the last buffer for all lcores. > The name suggests it is one single buffer per lcore. > What about "last_buffer" or "current_buffer"? > Would "value_buffer" be better? Or "values_buffer", although that sounds awkward. "current_value_buffer". I agree lcore_buffer is very generic. The buffer holds values for all lcore ids, for one or (usually many) more lcore variables. >> +static size_t offset = RTE_MAX_LCORE_VAR; > > A comment may be useful for this value: it triggers the first alloc? > Yes. I will add a comment. >> + >> +static void * >> +lcore_var_alloc(size_t size, size_t align) >> +{ >> + void *handle; >> + unsigned int lcore_id; >> + void *value; >> + >> + offset = RTE_ALIGN_CEIL(offset, align); >> + >> + if (offset + size > RTE_MAX_LCORE_VAR) { >> +#ifdef RTE_EXEC_ENV_WINDOWS >> + lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE, >> + RTE_CACHE_LINE_SIZE); >> +#else >> + lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE, >> + LCORE_BUFFER_SIZE); >> +#endif >> + RTE_VERIFY(lcore_buffer != NULL); > > Please no panic in a lib. > You can return NULL. > One could, but it would be a great cost to the API user. Something is seriously broken if these kind of allocations fail (considering when they occur and what size they are), just like something is seriously broken if the kernel fails (or is unwilling to) allocate pages used by static lcore id index arrays. >> + >> + offset = 0; >> + } >> + >> + handle = RTE_PTR_ADD(lcore_buffer, offset); >> + >> + offset += size; >> + >> + RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, value, handle) >> + memset(value, 0, size); >> + >> + EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a " >> + "%"PRIuPTR"-byte alignment", size, align); >> + >> + return handle; >> +} > > [...] >> +#ifndef _RTE_LCORE_VAR_H_ >> +#define _RTE_LCORE_VAR_H_ > > Really we don't need the first and last underscores, > but it's a detail. > I just follow the DPDK conventions here. I agree the conventions are wrong. >> + >> +/** >> + * @file >> + * >> + * RTE Lcore variables > > Please don't say "RTE", it is just a prefix. OK. I just follow the DPDK conventions here as well, but sure, I'll change it. > You can replace it with "DPDK" if you really want to be specific. > >> + * >> + * This API provides a mechanism to create and access per-lcore id >> + * variables in a space- and cycle-efficient manner. >> + * >> + * A per-lcore id variable (or lcore variable for short) has one value >> + * for each EAL thread and registered non-EAL thread. There is one >> + * instance for each current and future lcore id-equipped thread, with >> + * a total of RTE_MAX_LCORE instances. The value of an lcore variable >> + * for a particular lcore id is independent from other values (for >> + * other lcore ids) within the same lcore variable. >> + * >> + * In order to access the values of an lcore variable, a handle is >> + * used. The type of the handle is a pointer to the value's type >> + * (e.g., for an @c uint32_t lcore variable, the handle is a >> + * uint32_t *. The handle type is used to inform the >> + * access macros the type of the values. A handle may be passed >> + * between modules and threads just like any pointer, but its value >> + * must be treated as a an opaque identifier. An allocated handle >> + * never has the value NULL. > > Most of the explanations here would be better hosted in the prog guide. > The Doxygen API is better suited for short and direct explanations. > Yeah, maybe. Reworking this to the programming guide format and having that reviewed is a sizable undertaking though. >> + * >> + * @b Creation >> + * >> + * An lcore variable is created in two steps: >> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDLE. >> + * 2. Allocate lcore variable storage and initialize the handle with >> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or >> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time of > > *at* the time > >> + * module initialization, but may be done at any time. > > You mean it does not depend on EAL initialization? > Lcore variables may be used prior to any other parts of the EAL have been initialized. >> + * >> + * An lcore variable is not tied to the owning thread's lifetime. It's >> + * available for use by any thread immediately after having been >> + * allocated, and continues to be available throughout the lifetime of >> + * the EAL. >> + * >> + * Lcore variables cannot and need not be freed. > > I'm curious about that. > If EAL is closed, and the application continues its life, > then we want all this memory to be cleaned as well. > Do you know rte_eal_cleanup()? I think the primary reason you would like to free the buffers is to avoid false positives from tools like valgrind memcheck (if anyone managed to get that working with DPDK). rte_eal_cleanup() freeing the buffers and resetting the offset would make sense. That however would require the buffers to be tracked (e.g., as a linked list). From a footprint point of view, TLS allocations and static arrays also aren't freed by rte_eal_cleanup(). > >> + * >> + * @b Access >> + * >> + * The value of any lcore variable for any lcore id may be accessed >> + * from any thread (including unregistered threads), but it should >> + * only be *frequently* read from or written to by the owner. > > Would be interesting to explain why. > This is intended to be brief and false sharing is mentioned elsewhere. >> + * >> + * Values of the same lcore variable but owned by two different lcore >> + * ids may be frequently read or written by the owners without risking >> + * false sharing. > > Again you could explain why if you explained the storage layout. > What is the minimum object size to avoid false sharing? > Your objects may be as small as you want, and you still do not risk false sharing. All objects for a particular lcore id are grouped together, spatially. >> + * >> + * An appropriate synchronization mechanism (e.g., atomic loads and >> + * stores) should employed to assure there are no data races between > > should *be* > Fixed. >> + * the owning thread and any non-owner threads accessing the same >> + * lcore variable instance. >> + * >> + * The value of the lcore variable for a particular lcore id is >> + * accessed using @ref RTE_LCORE_VAR_LCORE_VALUE. >> + * >> + * A common pattern is for an EAL thread or a registered non-EAL >> + * thread to access its own lcore variable value. For this purpose, a >> + * short-hand exists in the form of @ref RTE_LCORE_VAR_VALUE. > > shorthand without hyphen? > Both works, but I'll change. >> + * >> + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a >> + * pointer with the same type as the value, it may not be directly >> + * dereferenced and must be treated as an opaque identifier. >> + * >> + * Lcore variable handles and value pointers may be freely passed >> + * between different threads. >> + * >> + * @b Storage >> + * >> + * An lcore variable's values may by of a primitive type like @c int, >> + * but would more typically be a @c struct. >> + * >> + * The lcore variable handle introduces a per-variable (not >> + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so >> + * there are some memory footprint gains to be made by organizing all >> + * per-lcore id data for a particular module as one lcore variable >> + * (e.g., as a struct). >> + * >> + * An application may choose to define an lcore variable handle, which >> + * it then it goes on to never allocate. > > I don't understand this sentence. > I have rephrased this. >> + * >> + * The size of an lcore variable's value must be less than the DPDK > > size of variable, not size of value > RTE_MAX_LCORE_VAR specifies the maximum size of a variable's value. The maximum amount of space required to hold a lcore variable is RTE_MAX_LCORE_VAR * RTE_MAX_LCORE. >> + * build-time constant @c RTE_MAX_LCORE_VAR. >> + * >> + * The lcore variable are stored in a series of lcore buffers, which > > variable*s* > Fixed. >> + * are allocated from the libc heap. Heap allocation failures are >> + * treated as fatal. > > Why not handling as an error, so the app has a chance to cleanup before crash? > Because you don't want to put the burden on the user (app or DPDK-internal) to attempt to clean up such failures, which in practice will never occur, and in case they do, they are just among several such early-memory-allocation failures where the application code has no say in what should occur. What happens if the TLS allocations are so large, the main thread can't be created? What happens if the BSS section is so large (because of all our RTE_MAX_LCORE-sized arrays) so its pages can't be made resident in memory? Lcore variables aren't a dynamic allocation facility. >> + * >> + * Lcore variables should generally *not* be @ref __rte_cache_aligned >> + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use >> + * of these constructs are designed to avoid false sharing. In the >> + * case of an lcore variable instance, the thread most recently >> + * accessing nearby data structures should almost-always be the lcore >> + * variables' owner. Adding padding will increase the effective memory >> + * working set size, potentially reducing performance. >> + * >> + * Lcore variable values take on an initial value of zero. >> + * >> + * @b Example > [...] >> + * @b Alternatives >> + * >> + * Lcore variables are designed to replace a pattern exemplified below: > > Would be better in the introduction (in the prog guide). > Yes. >> + * @code{.c} >> + * struct __rte_cache_aligned foo_lcore_state { >> + * int a; >> + * long b; >> + * RTE_CACHE_GUARD; >> + * }; >> + * >> + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE]; >> + * @endcode > [...] >> +/** >> + * Define an lcore variable handle. >> + * >> + * This macro defines a variable which is used as a handle to access >> + * the various instances of a per-lcore id variable. >> + * >> + * The aim with this macro is to make clear at the point of > > This long sentence may be shortened. > Indeed. Will do. >> + * declaration that this is an lcore handle, rather than a regular >> + * pointer. >> + * >> + * Add @b static as a prefix in case the lcore variable is only to be >> + * accessed from a particular translation unit. >> + */ >> +#define RTE_LCORE_VAR_HANDLE(type, name) \ >> + RTE_LCORE_VAR_HANDLE_TYPE(type) name >> + >> +/** >> + * Allocate space for an lcore variable, and initialize its handle. >> + * >> + * The values of the lcore variable are initialized to zero. > > The lcore variables are initialized to zero, not the values. > "The lcore variables are initialized to zero" is the same as "The lcore variables' values are initialized to zero" in my world, since the only thing that can be initialized in a lcore variable is its values (or "value instances" or just "instances", not sure I'm consistent here). > Don't you mention 0 in align? > I don't understand the question. Are you asking why objects are worst-case aligned when RTE_LCORE_VAR_ALLOC_SIZE() is used? Rather than naturally aligned? Good question, in that case. I guess it would make more sense if they were naturally aligned. I just thought in terms of malloc() semantics, but maybe that's wrong. >> + */ >> +#define RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, align) \ >> + handle = rte_lcore_var_alloc(size, align) >> + >> +/** >> + * Allocate space for an lcore variable, and initialize its handle, >> + * with values aligned for any type of object. >> + * >> + * The values of the lcore variable are initialized to zero. >> + */ >> +#define RTE_LCORE_VAR_ALLOC_SIZE(handle, size) \ >> + RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, size, 0) >> + >> +/** >> + * Allocate space for an lcore variable of the size and alignment requirements >> + * suggested by the handle pointer type, and initialize its handle. >> + * >> + * The values of the lcore variable are initialized to zero. >> + */ >> +#define RTE_LCORE_VAR_ALLOC(handle) \ >> + RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(handle, sizeof(*(handle)), \ >> + alignof(typeof(*(handle)))) >> + >> +/** >> + * Allocate an explicitly-sized, explicitly-aligned lcore variable by >> + * means of a @ref RTE_INIT constructor. >> + * >> + * The values of the lcore variable are initialized to zero. >> + */ >> +#define RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, align) \ >> + RTE_INIT(rte_lcore_var_init_ ## name) \ >> + { \ >> + RTE_LCORE_VAR_ALLOC_SIZE_ALIGN(name, size, align); \ >> + } >> + >> +/** >> + * Allocate an explicitly-sized lcore variable by means of a @ref >> + * RTE_INIT constructor. >> + * >> + * The values of the lcore variable are initialized to zero. >> + */ >> +#define RTE_LCORE_VAR_INIT_SIZE(name, size) \ >> + RTE_LCORE_VAR_INIT_SIZE_ALIGN(name, size, 0) >> + >> +/** >> + * Allocate an lcore variable by means of a @ref RTE_INIT constructor. >> + * >> + * The values of the lcore variable are initialized to zero. >> + */ >> +#define RTE_LCORE_VAR_INIT(name) \ >> + RTE_INIT(rte_lcore_var_init_ ## name) \ >> + { \ >> + RTE_LCORE_VAR_ALLOC(name); \ >> + } > > I don't get the need for RTE_INIT macros. Check rte_power_intrinsics.c I agree it's not obvious they are worth the API clutter. > It does not cover RTE_INIT_PRIO and anyway > another RTE_INIT is probably already there in the module. > >> + >> +/** >> + * Get void pointer to lcore variable instance with the specified >> + * lcore id. >> + * >> + * @param lcore_id >> + * The lcore id specifying which of the @c RTE_MAX_LCORE value >> + * instances should be accessed. The lcore id need not be valid >> + * (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer >> + * is also not valid (and thus should not be dereferenced). >> + * @param handle >> + * The lcore variable handle. > > handle pointer > No, handle. A handle pointer could be thought of as &handle. >> + */ >> +static inline void * >> +rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle) > > What a long name! > What about rte_lcore_var() ? > It's long but consistent with the rest of the API. This is not a function you will be see called often in API user code. Most will use the access macros. >> +{ >> + return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR); >> +} >> + >> +/** >> + * Get pointer to lcore variable instance with the specified lcore id. > > Same description as the function above. > I don't understand this comment. >> + * >> + * @param lcore_id >> + * The lcore id specifying which of the @c RTE_MAX_LCORE value >> + * instances should be accessed. The lcore id need not be valid >> + * (e.g., may be @ref LCORE_ID_ANY), but in such a case, the pointer >> + * is also not valid (and thus should not be dereferenced). >> + * @param handle >> + * The lcore variable handle. >> + */ >> +#define RTE_LCORE_VAR_LCORE_VALUE(lcore_id, handle) \ >> + ((typeof(handle))rte_lcore_var_lcore_ptr(lcore_id, handle)) >> + >> +/** >> + * Get pointer to lcore variable instance of the current thread. >> + * >> + * May only be used by EAL threads and registered non-EAL threads. >> + */ >> +#define RTE_LCORE_VAR_VALUE(handle) \ > > RTE_LCORE_VAR_LOCAL? > Why is that better? Maybe Morten can remind me here, but I think we had a discussion about RTE_LCORE_VAR() versus RTE_LCORE_VAR_VALUE() at some point, and RTE_LCORE_VAR_VALUE() was deemed more clear. >> + RTE_LCORE_VAR_LCORE_VALUE(rte_lcore_id(), handle) >> + >> +/** >> + * Iterate over each lcore id's value for an lcore variable. >> + * >> + * @param lcore_id >> + * An unsigned int variable successively set to the >> + * lcore id of every valid lcore id (up to @c RTE_MAX_LCORE). >> + * @param value >> + * A pointer variable successively set to point to lcore variable >> + * value instance of the current lcore id being processed. >> + * @param handle >> + * The lcore variable handle. >> + */ >> +#define RTE_LCORE_VAR_FOREACH_VALUE(lcore_id, value, handle) \ > > RTE_LCORE_VAR_FOREACH? > Has been discussed already, and VALUE was deemed to improve readability. RTE_LCORE_VAR_FOREACH could mean "iterate over all lcore variables", which is not what the function does. >> + for ((lcore_id) = \ >> + (((value) = RTE_LCORE_VAR_LCORE_VALUE(0, handle)), 0); \ >> + (lcore_id) < RTE_MAX_LCORE; \ >> + (lcore_id)++, (value) = RTE_LCORE_VAR_LCORE_VALUE(lcore_id, \ >> + handle)) >> + >> +/** >> + * Allocate space in the per-lcore id buffers for an lcore variable. >> + * >> + * The pointer returned is only an opaque identifer of the variable. To >> + * get an actual pointer to a particular instance of the variable use >> + * @ref RTE_LCORE_VAR_VALUE or @ref RTE_LCORE_VAR_LCORE_VALUE. >> + * >> + * The lcore variable values' memory is set to zero. >> + * >> + * The allocation is always successful, barring a fatal exhaustion of >> + * the per-lcore id buffer space. >> + * >> + * rte_lcore_var_alloc() is not multi-thread safe. >> + * >> + * @param size >> + * The size (in bytes) of the variable's per-lcore id value. Must be > 0. >> + * @param align >> + * If 0, the values will be suitably aligned for any kind of type >> + * (i.e., alignof(max_align_t)). Otherwise, the values will be aligned >> + * on a multiple of *align*, which must be a power of 2 and equal or >> + * less than @c RTE_CACHE_LINE_SIZE. >> + * @return >> + * The variable's handle, stored in a void pointer value. The value >> + * is always non-NULL. >> + */ >> +__rte_experimental >> +void * >> +rte_lcore_var_alloc(size_t size, size_t align); > > [...] >> --- a/lib/eal/version.map >> +++ b/lib/eal/version.map >> @@ -396,6 +396,8 @@ EXPERIMENTAL { >> >> # added in 24.03 >> rte_vfio_get_device_info; # WINDOWS_NO_EXPORT >> + > > # added in 24.11 > Fixed. Thanks for the review. >> + rte_lcore_var_alloc; > > >