From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EE60F45B12; Fri, 11 Oct 2024 11:11:56 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D24664028B; Fri, 11 Oct 2024 11:11:56 +0200 (CEST) Received: from fhigh-a2-smtp.messagingengine.com (fhigh-a2-smtp.messagingengine.com [103.168.172.153]) by mails.dpdk.org (Postfix) with ESMTP id 29305400D5 for ; Fri, 11 Oct 2024 11:11:55 +0200 (CEST) Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.phl.internal (Postfix) with ESMTP id B3451114008F; Fri, 11 Oct 2024 05:11:54 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Fri, 11 Oct 2024 05:11:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1728637914; x=1728724314; bh=URNA+aP19PV2enoUUxesx0PnG21HcnkIFEorXBNNf3g=; b= JHWCKOOoh728/ISFX8iyMtidQq8iK3UAFugP+d7sPujG/KXw4RJE52cRf71h0QM8 xTOQIe7/MLJuCG7TSZqdAvCkw6tSUd1Wy0VdLZxzZYy7/34xd4qQPrMwxKlPsuuV T+XdttFzvLGzXNb/fDejBASgkEpA5MeTZ5DTyJHalTqgv0sLuJIv93tTweKwNONE i+LUeG9hXGASlSGZxx0aCTvk+nNC7IoO9D4PrCe1b0ijOeBjgutwSLwvwZSszkjt 71I8GzayUXTm4a/H0R4fDpOY8baM/Vb+MBEn+GMRBmqPRj4mJEBeePTE7pLBvgtl sb6tIa4/kBbWyUqyJWWrag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1728637914; x= 1728724314; bh=URNA+aP19PV2enoUUxesx0PnG21HcnkIFEorXBNNf3g=; b=f koZFH/QBXjfmqS38icmt3IoXC5Tvo+v4g8uniEbJ3w7Du7MGkfSP3cHiw6ZyFzpq rvWoamITQMctFGNGDY4gR74xEzGUUeYgVJYQO4KptAMKbjK+2ZUB6FzesNvtmAOq dms8dfDIj3sXxpLZ8GYCo0fkpjpPNgCumOeiHAI5y97bADbcCvUPm/HvOWU23Mhc UnDuMjDhkNrnUjPdoVBX/maXliMP3LxfAj5Bic37vDexvlle3m5yjQz7xNceGnGb 998NM9Ye+DIETiU+4NG/CPVvxrt3EQndVb1xLTEi/BZ99D9pUtOTGJqBT5mM5V6J /tjwT4WGbnoceu/pKloyA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdefkedgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhephffvvefufffkjghfggfgtgesthhqredttddtjeen ucfhrhhomhepvfhhohhmrghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrg hlohhnrdhnvghtqeenucggtffrrghtthgvrhhnpeegtddtleejjeegffekkeektdejvedt heevtdekiedvueeuvdeiuddvleevjeeujeenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtpdhn sggprhgtphhtthhopeduuddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepmhgrth htihgrshdrrhhonhhnsghlohhmsegvrhhitghsshhonhdrtghomhdprhgtphhtthhopegu vghvseguphgukhdrohhrghdprhgtphhtthhopehmsgesshhmrghrthhshhgrrhgvshihsh htvghmshdrtghomhdprhgtphhtthhopehsthgvphhhvghnsehnvghtfihorhhkphhluhhm sggvrhdrohhrghdprhgtphhtthhopehkohhnshhtrghnthhinhdrvhdrrghnrghnhigvvh eshigrnhguvgigrdhruhdprhgtphhtthhopegurghvihgurdhmrghrtghhrghnugesrhgv ughhrghtrdgtohhmpdhrtghpthhtohepjhgvrhhinhhjsehmrghrvhgvlhhlrdgtohhmpd hrtghpthhtoheplhhukhgrrdhjrghnkhhovhhitgesvghrihgtshhsohhnrdgtohhmpdhr tghpthhtohepkhhonhhsthgrnhhtihhnrdgrnhgrnhihvghvsehhuhgrfigvihdrtghomh X-ME-Proxy: Feedback-ID: i47234305:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 11 Oct 2024 05:11:52 -0400 (EDT) From: Thomas Monjalon To: Mattias =?UTF-8?B?UsO2bm5ibG9t?= Cc: dev@dpdk.org, Morten =?UTF-8?B?QnLDuHJ1cA==?= , Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob , Luka Jankovic , Konstantin Ananyev , Chengwen Feng , Mattias =?UTF-8?B?UsO2bm5ibG9t?= Subject: Re: [PATCH v9 1/7] eal: add static per-lcore memory allocation facility Date: Fri, 11 Oct 2024 11:11:50 +0200 Message-ID: <10482556.0AQdONaE2F@thomas> In-Reply-To: <5818e22a-1e22-4533-85b8-fb9d00c834da@lysator.liu.se> References: <20241010141349.813088-2-mattias.ronnblom@ericsson.com> <1829355.yIU609i1g2@thomas> <5818e22a-1e22-4533-85b8-fb9d00c834da@lysator.liu.se> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 11/10/2024 10:04, Mattias R=C3=B6nnblom: > On 2024-10-10 23:24, Thomas Monjalon wrote: > > Hello, > >=20 > > This new feature looks to bring something interesting to DPDK. > > There was a good amount of discussion and review, > > and there is a real effort of documentation. > >=20 > > However, some choices done in this implementation > > were not explained or advertised enough in the documentation, > > in my opinion. > >=20 >=20 > Are those of relevance to the API user? I think it helps to understand when we should use this API. Such design explanation may come in the prog guide RST file. > > I think the first thing to add is an explanation of the memory layout. > > Maybe that a SVG drawing would help to show how it is stored. >=20 > That would be helpful to someone wanting to understand the internals.=20 > But where should that go? If it's put in the API, it will also obscure=20 > the *actual* API documentation. Of course not in API doc. I'm talking about moving a lot of explanations in the prog guide, and add a bit more about the layout. > I have some drawings already, and I agree they are very helpful - both=20 > in explaining how things work, and making obvious why the memory layout=20 > resulting from the use of lcore variables are superior to that of the=20 > lcore id-index static array approach. Cool, please add some in the prog guide. > > We also need to explain why it is not using rte_malloc. > >=20 > > Also please could you re-read the doc and comments in detail? > > I think some words are missing and there are typos. > > While at it, please allow for easy update of the text > > by starting each sentence on a new line. > > Breaking lines logically is better for future patches. > > One more advice: avoid very long sentences. >=20 > I've gone through the documentation and will post a new patch set. OK thanks. > There's been a lot of comments and discussion on this patch set. Did you= =20 > have anything in particular in mind? Nothing more than what I raised in this review. > > Do you have benchmarks results of the modules using such variables > > (power, random, service)? > > It would be interesting to compare time efficiency and memory usage > > before/after, with different number of threads. > >=20 >=20 > I have the dummy modules of test_lcore_var_perf.c, which show the=20 > performance benefits as the number of modules using lcore variables=20 > increases. >=20 > That said, the gains are hard to quantify with micro benchmarks, and for= =20 > real-world performance, one really has to start using the facility at=20 > scale before anything interesting may happen. >=20 > Keep in mind however, that while this is new to DPDK, similar facilities= =20 > already exists your favorite UN*X kernel. The implementation is=20 > different, but I think it's accurate to say the goal and the effects=20 > should be the same. >=20 > One can also run the perf autotest for RTE random, but such tests only=20 > show lcore variables doesn't make things significantly worse when the L1= =20 > cache is essentially unused. (In fact, the lcore variable-enabled=20 > rte_random.c somewhat counter-intuitively generates a 64-bit number 1=20 > TSC cycle faster than the old version on my system.) >=20 > Just to be clear: it's the footprint in the core-private caches we are=20 > attempting to reduce. OK > > 10/10/2024 16:21, Mattias R=C3=B6nnblom: > >> Introduce DPDK per-lcore id variables, or lcore variables for short. > >> > >> An lcore variable has one value for every current and future lcore > >> id-equipped thread. > >=20 > > I find it difficult to read "lcore id-equipped thread". > > Can we just say "DPDK thread"? >=20 > Sure, if you point me to a definition of what a DPDK thread is. >=20 > I can think of at least four potential definitions > * An EAL thread > * An EAL thread or a registered non-EAL thread > * Any thread calling into DPDK APIs > * Any thread living in a DPDK process OK I understand your point. If we move the design explanations in the prog guide, we can explain this point in the introduction of the chapter. > > [...] > >> +An application, a DPDK library or PMD may keep opt to keep per-thread > >> +state. > >=20 > > I don't understand this sentence. >=20 > Which part is unclear? "keep opt to keep per-thread" What do you mean? [...] > >> +Variables with thread-local storage are allocated at the time of > >> +thread creation, and exists until the thread terminates, for every > >> +thread in the process. Only very small object should be allocated in > >> +TLS, since large TLS objects significantly slows down thread creation > >> +and may needlessly increase memory footprint for application that make > >> +extensive use of unregistered threads. > >=20 > > I don't understand the relation with non-DPDK threads. >=20 > __thread isn't just for "DPDK threads". It will allocate memory on all=20 > threads in the process. OK May be good to add as a note. > > [...] > >> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE) > >=20 > > With #define RTE_MAX_LCORE_VAR 1048576, > > LCORE_BUFFER_SIZE can be 100MB, right? > >=20 >=20 > Sure. Unless you mlock the memory, it won't result in the DPDK process=20 > having 100MB worth of mostly-unused resident memory (RSS, in Linux=20 > speak). It would, would we use huge pages and thus effectively disabled=20 > demand paging. >=20 > This is similar to how thread stacks generally work, where you often get= =20 > a fairly sizable stack (e.g., 2MB) but as long as you don't use all of=20 > it, most of the pages won't be resident. >=20 > If you want to guard against such mlocked scenarios, you could consider=20 > lowering the max variable size. You could argue it's strange to have a=20 > large RTE_MAX_LCORE_VAR and yet tell the API user to only use it for=20 > small, often-used block of memory. >=20 > If RTE_MAX_LCORE_VAR should have a different value, what should it be? That's fine > >> +static void *lcore_buffer; > >=20 > > It is the last buffer for all lcores. > > The name suggests it is one single buffer per lcore. > > What about "last_buffer" or "current_buffer"? >=20 > Would "value_buffer" be better? Or "values_buffer", although that sounds= =20 > awkward. "current_value_buffer". >=20 > I agree lcore_buffer is very generic. >=20 > The buffer holds values for all lcore ids, for one or (usually many)=20 > more lcore variables. So you don't need to mention "lcore" in this variable. The most important is that it is the last buffer allocated IMHO. > >> +static size_t offset =3D RTE_MAX_LCORE_VAR; > >=20 > > A comment may be useful for this value: it triggers the first alloc? >=20 > Yes. I will add a comment. >=20 > >> + > >> +static void * > >> +lcore_var_alloc(size_t size, size_t align) > >> +{ > >> + void *handle; > >> + unsigned int lcore_id; > >> + void *value; > >> + > >> + offset =3D RTE_ALIGN_CEIL(offset, align); > >> + > >> + if (offset + size > RTE_MAX_LCORE_VAR) { > >> +#ifdef RTE_EXEC_ENV_WINDOWS > >> + lcore_buffer =3D _aligned_malloc(LCORE_BUFFER_SIZE, > >> + RTE_CACHE_LINE_SIZE); > >> +#else > >> + lcore_buffer =3D aligned_alloc(RTE_CACHE_LINE_SIZE, > >> + LCORE_BUFFER_SIZE); > >> +#endif > >> + RTE_VERIFY(lcore_buffer !=3D NULL); > >=20 > > Please no panic in a lib. > > You can return NULL. >=20 > One could, but it would be a great cost to the API user. >=20 > Something is seriously broken if these kind of allocations fail=20 > (considering when they occur and what size they are), just like=20 > something is seriously broken if the kernel fails (or is unwilling to)=20 > allocate pages used by static lcore id index arrays. As said in another email, we may return NULL in this function and have RTE_VERIFY in case of declarations for ease of such API. So the user has a choice to use an API which returns an error or a simpler one with macros. > > [...] > >> +#ifndef _RTE_LCORE_VAR_H_ > >> +#define _RTE_LCORE_VAR_H_ > >=20 > > Really we don't need the first and last underscores, > > but it's a detail. >=20 > I just follow the DPDK conventions here. >=20 > I agree the conventions are wrong. Such conventions are not consistent. Let's do the right thing. > >> + > >> +/** > >> + * @file > >> + * > >> + * RTE Lcore variables > >=20 > > Please don't say "RTE", it is just a prefix. >=20 > OK. >=20 > I just follow the DPDK conventions here as well, but sure, I'll change it. Not really a convention. > > You can replace it with "DPDK" if you really want to be specific. > >=20 > >> + * > >> + * This API provides a mechanism to create and access per-lcore id > >> + * variables in a space- and cycle-efficient manner. > >> + * > >> + * A per-lcore id variable (or lcore variable for short) has one value > >> + * for each EAL thread and registered non-EAL thread. There is one > >> + * instance for each current and future lcore id-equipped thread, with > >> + * a total of RTE_MAX_LCORE instances. The value of an lcore variable > >> + * for a particular lcore id is independent from other values (for > >> + * other lcore ids) within the same lcore variable. > >> + * > >> + * In order to access the values of an lcore variable, a handle is > >> + * used. The type of the handle is a pointer to the value's type > >> + * (e.g., for an @c uint32_t lcore variable, the handle is a > >> + * uint32_t *. The handle type is used to inform the > >> + * access macros the type of the values. A handle may be passed > >> + * between modules and threads just like any pointer, but its value > >> + * must be treated as a an opaque identifier. An allocated handle > >> + * never has the value NULL. > >=20 > > Most of the explanations here would be better hosted in the prog guide. > > The Doxygen API is better suited for short and direct explanations. >=20 > Yeah, maybe. Reworking this to the programming guide format and having=20 > that reviewed is a sizable undertaking though. It is mostly a matter of moving text. I'm on it, I can review quickly. > >> + * > >> + * @b Creation > >> + * > >> + * An lcore variable is created in two steps: > >> + * 1. Define an lcore variable handle by using @ref RTE_LCORE_VAR_HA= NDLE. > >> + * 2. Allocate lcore variable storage and initialize the handle with > >> + * a unique identifier by @ref RTE_LCORE_VAR_ALLOC or > >> + * @ref RTE_LCORE_VAR_INIT. Allocation generally occurs the time = of > >=20 > > *at* the time > >=20 > >> + * module initialization, but may be done at any time. > >=20 > > You mean it does not depend on EAL initialization? >=20 > Lcore variables may be used prior to any other parts of the EAL have=20 > been initialized. Please make it explicit. > >> + * > >> + * An lcore variable is not tied to the owning thread's lifetime. It's > >> + * available for use by any thread immediately after having been > >> + * allocated, and continues to be available throughout the lifetime of > >> + * the EAL. > >> + * > >> + * Lcore variables cannot and need not be freed. > >=20 > > I'm curious about that. > > If EAL is closed, and the application continues its life, > > then we want all this memory to be cleaned as well. > > Do you know rte_eal_cleanup()? >=20 > I think the primary reason you would like to free the buffers is to=20 > avoid false positives from tools like valgrind memcheck (if anyone=20 > managed to get that working with DPDK). >=20 > rte_eal_cleanup() freeing the buffers and resetting the offset would=20 > make sense. That however would require the buffers to be tracked (e.g.,=20 > as a linked list). >=20 > From a footprint point of view, TLS allocations and static arrays also=20 > aren't freed by rte_eal_cleanup(). They are not dynamic as this one. I still think it is required. Think about an application starting and stopping some DPDK modules, It would be a serious leakage. > >> + * @b Access > >> + * > >> + * The value of any lcore variable for any lcore id may be accessed > >> + * from any thread (including unregistered threads), but it should > >> + * only be *frequently* read from or written to by the owner. > >=20 > > Would be interesting to explain why. >=20 > This is intended to be brief and false sharing is mentioned elsewhere. >=20 > >> + * > >> + * Values of the same lcore variable but owned by two different lcore > >> + * ids may be frequently read or written by the owners without risking > >> + * false sharing. > >=20 > > Again you could explain why if you explained the storage layout. > > What is the minimum object size to avoid false sharing? >=20 > Your objects may be as small as you want, and you still do not risk=20 > false sharing. All objects for a particular lcore id are grouped=20 > together, spatially. [...] > >> + * are allocated from the libc heap. Heap allocation failures are > >> + * treated as fatal. > >=20 > > Why not handling as an error, so the app has a chance to cleanup before= crash? > >=20 >=20 > Because you don't want to put the burden on the user (app or=20 > DPDK-internal) to attempt to clean up such failures, which in practice=20 > will never occur, and in case they do, they are just among several such=20 > early-memory-allocation failures where the application code has no say=20 > in what should occur. >=20 > What happens if the TLS allocations are so large, the main thread can't=20 > be created? >=20 > What happens if the BSS section is so large (because of all our=20 > RTE_MAX_LCORE-sized arrays) so its pages can't be made resident in memory? >=20 > Lcore variables aren't a dynamic allocation facility. I understand that and I agree. In case someone is using it as a dynamic facility with the function, can we offer them a NULL return? [...] > >> +/** > >> + * Allocate space for an lcore variable, and initialize its handle. > >> + * > >> + * The values of the lcore variable are initialized to zero. > >=20 > > The lcore variables are initialized to zero, not the values. > >=20 >=20 > "The lcore variables are initialized to zero" is the same as "The lcore=20 > variables' values are initialized to zero" in my world, since the only=20 > thing that can be initialized in a lcore variable is its values (or=20 > "value instances" or just "instances", not sure I'm consistent here). OK > > Don't you mention 0 in align? >=20 > I don't understand the question. Are you asking why objects are=20 > worst-case aligned when RTE_LCORE_VAR_ALLOC_SIZE() is used? Rather than=20 > naturally aligned? No I just mention that 0 align value is not documented here. > Good question, in that case. I guess it would make more sense if they=20 > were naturally aligned. I just thought in terms of malloc() semantics,=20 > but maybe that's wrong. [...] > >> +#define RTE_LCORE_VAR_INIT(name) \ > >> + RTE_INIT(rte_lcore_var_init_ ## name) \ > >> + { \ > >> + RTE_LCORE_VAR_ALLOC(name); \ > >> + } > >=20 > > I don't get the need for RTE_INIT macros. >=20 > Check rte_power_intrinsics.c I'll check later. > I agree it's not obvious they are worth the API clutter. >=20 > > It does not cover RTE_INIT_PRIO and anyway > > another RTE_INIT is probably already there in the module. > >> + */ > >> +static inline void * > >> +rte_lcore_var_lcore_ptr(unsigned int lcore_id, void *handle) > >=20 > > What a long name! > > What about rte_lcore_var() ? > >=20 >=20 > It's long but consistent with the rest of the API. >=20 > This is not a function you will be see called often in API user code.=20 > Most will use the access macros. I let you discuss naming with Morten. It seems he agrees with me about making it short.