From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0857945B44; Wed, 16 Oct 2024 00:33:33 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C3A0040144; Wed, 16 Oct 2024 00:33:32 +0200 (CEST) Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by mails.dpdk.org (Postfix) with ESMTP id D84CF400D7 for ; Wed, 16 Oct 2024 00:33:31 +0200 (CEST) Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-7e6cbf6cd1dso4122123a12.3 for ; Tue, 15 Oct 2024 15:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1729031611; x=1729636411; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=93dR29Wy9IvwoxBhPjZ64pOaH6/pm/nt3T05ZrAS6FY=; b=0VsusHpA2eY6ScY2A6W72VwUjS9BdlfVz6TpnjS2NMMLCKen9ZSKfd/iyaMfHvb9dQ T/Myvqh0hqskFDvjv4FAJPWyeP2SSzZuDB0twmOTa12ou2+cxiIgDahIeQgj8kxmPeY5 4RRayZkIczRcMYrSMd0Y3ubzfJKUGkVXLTXAuStNyhhzifgNet0ekg0QiGytjsJ3E3nC 2KHDVo5m1+6rP2cQfjnuvJX/40kQOQBj1OOk6lNQrWS+Jjqvky+1fXEim5S8AAYOAEpE f4dXNm8wKGE57HR5G2ka3Jyr9iIK0MXeyzmTxS0tSMVG8fxDKBbv96pTJWwI+7CTi28s 7LoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729031611; x=1729636411; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=93dR29Wy9IvwoxBhPjZ64pOaH6/pm/nt3T05ZrAS6FY=; b=jjYYttegf2d7j301rb0SftC7fd2KO8hUxyTLwknK5DG8u5mWUotTS0hlAoka/fN9d3 4IbA1UHGo9e1/bN+gpfWronIaX384pMiR74YrN589k+8Ar2cgHtJvzlVp0uA57vTFQaW dPi+v6C9H5oE30zWdj2BoozMUSH5l4N++zcHlg5VSL7Y4FLE25shvmowdrL0AfoTnWRu pdgg4Dxub7YXrMNlaspbcNiuLXZ62lPm7mHmA2yy1T2JUmU3um5BmjMnxwQFEZaaznAj xPeJJQU9eHtFVkAjFDvunT3CNlglvE0o/3ULPRfK/9Y2h+iLEFiGbn2v+jawf7Czcm8u 9f+g== X-Gm-Message-State: AOJu0YznAYtPhiMdb0D849O01n9xf91D540IYp1LwU+elpEyfzYJn/4J kjQG+XooeIhJ7SkX0azBukl3x9k4cYZu7tUGBtgtDiXzSFEVs1doq089vzEY2uo= X-Google-Smtp-Source: AGHT+IEOlfHd5Tq2+MUIQiKddM8LGNvBG0jnCvORiS6tmeIIzZnQMubjXbwKRfEPinegsPk8AfgYew== X-Received: by 2002:a17:90a:fa8c:b0:2e2:b937:eeae with SMTP id 98e67ed59e1d1-2e2f0a339b6mr19597643a91.5.1729031610968; Tue, 15 Oct 2024 15:33:30 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e392ed3b41sm2508523a91.23.2024.10.15.15.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Oct 2024 15:33:30 -0700 (PDT) Date: Tue, 15 Oct 2024 15:33:29 -0700 From: Stephen Hemminger To: Mattias =?UTF-8?B?UsO2bm5ibG9t?= Cc: , , Morten =?UTF-8?B?QnLDuHJ1cA==?= , Konstantin Ananyev , David Marchand , Jerin Jacob , Luka Jankovic , Konstantin Ananyev , Chengwen Feng Subject: Re: [PATCH v13 1/7] eal: add static per-lcore memory allocation facility Message-ID: <20241015153329.04e9ba94@hermes.local> In-Reply-To: <20241015093344.824073-2-mattias.ronnblom@ericsson.com> References: <20241015065505.823840-2-mattias.ronnblom@ericsson.com> <20241015093344.824073-1-mattias.ronnblom@ericsson.com> <20241015093344.824073-2-mattias.ronnblom@ericsson.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, 15 Oct 2024 11:33:38 +0200 Mattias R=C3=B6nnblom wrote: > + * Lcore variables > + * > + * This API provides a mechanism to create and access per-lcore id > + * variables in a space- and cycle-efficient manner. > + * > + * A per-lcore id variable (or lcore variable for short) holds a > + * unique value for each EAL thread and registered non-EAL > + * thread. There is one instance for each current and future lcore > + * id-equipped thread, with a total of @c RTE_MAX_LCORE instances. The > + * value of the lcore variable for one lcore id is independent from > + * the values assigned to other lcore ids within the same variable. > + * > + * In order to access the values of an lcore variable, a handle is > + * used. The type of the handle is a pointer to the value's type > + * (e.g., for an @c uint32_t lcore variable, the handle is a > + * uint32_t *). The handle type is used to inform the > + * access macros of the type of the values. A handle may be passed > + * between modules and threads just like any pointer, but its value > + * must be treated as an opaque identifier. An allocated handle never > + * has the value NULL. > + * > + * @b Creation > + * > + * An lcore variable is created in two steps: > + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HANDL= E. > + * 2. Allocate lcore variable storage and initialize the handle with > + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or > + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs at the time > + * of module initialization, but may be done at any time. > + * > + * The lifetime of an lcore variable is not tied to the thread that > + * created it. Its per lcore id values (up to @c RTE_MAX_LCORE) are > + * available from the moment the lcore variable is created and > + * continue to exist throughout the entire lifetime of the EAL, > + * whether or not the lcore id is currently in use. > + * > + * Lcore variables cannot and need not be freed. > + * > + * @b Access > + * > + * The value of any lcore variable for any lcore id may be accessed > + * from any thread (including unregistered threads), but it should > + * only be *frequently* read from or written to by the owner. > + * > + * Values of the same lcore variable, associated with different lcore > + * ids may be frequently read or written by their respective owners > + * without risking false sharing. > + * > + * An appropriate synchronization mechanism (e.g., atomic loads and > + * stores) should be employed to prevent data races between the owning > + * thread and any other thread accessing the same value instance. > + * > + * The value of the lcore variable for a particular lcore id is > + * accessed using @ref RTE_LCORE_VAR_LCORE. > + * > + * A common pattern is for an EAL thread or a registered non-EAL > + * thread to access its own lcore variable value. For this purpose, a > + * shorthand exists as @ref RTE_LCORE_VAR. > + * > + * Although the handle (as defined by @ref RTE_LCORE_VAR_HANDLE) is a > + * pointer with the same type as the value, it may not be directly > + * dereferenced and must be treated as an opaque identifier. > + * > + * Lcore variable handles and value pointers may be freely passed > + * between different threads. > + * > + * @b Storage > + * > + * An lcore variable's values may be of a primitive type like @c int, > + * but would more typically be a @c struct. > + * > + * The lcore variable handle introduces a per-variable (not > + * per-value/per-lcore id) overhead of @c sizeof(void *) bytes, so > + * there are some memory footprint gains to be made by organizing all > + * per-lcore id data for a particular module as one lcore variable > + * (e.g., as a struct). > + * > + * An application may define an lcore variable handle without ever > + * allocating it. > + * > + * The size of an lcore variable's value must be less than the DPDK > + * build-time constant @c RTE_MAX_LCORE_VAR. > + * > + * Lcore variables are stored in a series of lcore buffers, which are > + * allocated from the libc heap. Heap allocation failures are treated > + * as fatal. > + * > + * Lcore variables should generally *not* be @ref __rte_cache_aligned > + * and need *not* include a @ref RTE_CACHE_GUARD field, since the use > + * of these constructs are designed to avoid false sharing. In the > + * case of an lcore variable instance, the thread most recently > + * accessing nearby data structures should almost-always be the lcore > + * variable's owner. Adding padding will increase the effective memory > + * working set size, potentially reducing performance. > + * > + * Lcore variable values are initialized to zero by default. > + * > + * Lcore variables are not stored in huge page memory. > + * > + * @b Example > + * > + * Below is an example of the use of an lcore variable: > + * > + * @code{.c} > + * struct foo_lcore_state { > + * int a; > + * long b; > + * }; > + * > + * static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states); > + * > + * long foo_get_a_plus_b(void) > + * { > + * struct foo_lcore_state *state =3D RTE_LCORE_VAR(lcore_states); > + * > + * return state->a + state->b; > + * } > + * > + * RTE_INIT(rte_foo_init) > + * { > + * RTE_LCORE_VAR_ALLOC(lcore_states); > + * > + * unsigned int lcore_id; > + * struct foo_lcore_state *state; > + * RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) { > + * (initialize 'state') > + * } > + * > + * (other initialization) > + * } > + * @endcode > + * > + * > + * @b Alternatives > + * > + * Lcore variables are designed to replace a pattern exemplified below: > + * @code{.c} > + * struct __rte_cache_aligned foo_lcore_state { > + * int a; > + * long b; > + * RTE_CACHE_GUARD; > + * }; > + * > + * static struct foo_lcore_state lcore_states[RTE_MAX_LCORE]; > + * @endcode > + * > + * This scheme is simple and effective, but has one drawback: the data > + * is organized so that objects related to all lcores for a particular > + * module are kept close in memory. At a bare minimum, this requires > + * sizing data structures (e.g., using `__rte_cache_aligned`) to an > + * even number of cache lines to avoid false sharing. With CPU > + * hardware prefetching and memory loads resulting from speculative > + * execution (functions which seemingly are getting more eager faster > + * than they are getting more intelligent), one or more "guard" cache > + * lines may be required to separate one lcore's data from another's > + * and prevent false sharing. > + * > + * Lcore variables offer the advantage of working with, rather than > + * against, the CPU's assumptions. A next-line hardware prefetcher, > + * for example, may function as intended (i.e., to the benefit, not > + * detriment, of system performance). > + * > + * Another alternative to @ref rte_lcore_var.h is the @ref > + * rte_per_lcore.h API, which makes use of thread-local storage (TLS, > + * e.g., GCC __thread or C11 _Thread_local). The main differences > + * between by using the various forms of TLS (e.g., @ref > + * RTE_DEFINE_PER_LCORE or _Thread_local) and the use of lcore > + * variables are: > + * > + * * The lifecycle of a thread-local variable instance is tied to > + * that of the thread. The data cannot be accessed before the > + * thread has been created, nor after it has exited. As a result, > + * thread-local variables must be initialized in a "lazy" manner > + * (e.g., at the point of thread creation). Lcore variables may be > + * accessed immediately after having been allocated (which may occur > + * before any thread beyond the main thread is running). > + * * A thread-local variable is duplicated across all threads in the > + * process, including unregistered non-EAL threads (i.e., > + * "regular" threads). For DPDK applications heavily relying on > + * multi-threading (in conjunction to DPDK's "one thread per core" > + * pattern), either by having many concurrent threads or > + * creating/destroying threads at a high rate, an excessive use of > + * thread-local variables may cause inefficiencies (e.g., > + * increased thread creation overhead due to thread-local storage > + * initialization or increased total RAM footprint usage). Lcore > + * variables *only* exist for threads with an lcore id. > + * * If data in thread-local storage may be shared between threads > + * (i.e., can a pointer to a thread-local variable be passed to > + * and successfully dereferenced by non-owning thread) depends on > + * the specifics of the TLS implementation. With GCC __thread and > + * GCC _Thread_local, data sharing between threads is supported. > + * In the C11 standard, accessing another thread's _Thread_local > + * object is implementation-defined. Lcore variable instances may > + * be accessed reliably by any thread. > + */ For me this comment too wordy for code and belongs in the documentation ins= tead. Could also be reduced to more precise succinct language.