From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1687E45B40; Tue, 15 Oct 2024 08:29:25 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 92FF64027C; Tue, 15 Oct 2024 08:29:24 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 46C6340270 for ; Tue, 15 Oct 2024 08:29:23 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id D178612E4A for ; Tue, 15 Oct 2024 08:29:22 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id C55E012EA1; Tue, 15 Oct 2024 08:29:22 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.85] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id E7C7712DCE; Tue, 15 Oct 2024 08:29:19 +0200 (CEST) Message-ID: <07e55111-2ba0-4465-b866-8af8ad5d7cd1@lysator.liu.se> Date: Tue, 15 Oct 2024 08:29:19 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/6] eal: add static per-lcore memory allocation facility To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Jerin Jacob , thomas@monjalon.net Cc: dev@dpdk.org, Chengwen Feng , Stephen Hemminger , Konstantin Ananyev , David Marchand , Anatoly Burakov References: <20240910070344.699183-2-mattias.ronnblom@ericsson.com> <20240911170430.701685-1-mattias.ronnblom@ericsson.com> <20240911170430.701685-2-mattias.ronnblom@ericsson.com> <98CBD80474FA8B44BF855DF32C47DC35E9F6D3@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F6DB@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F7D0@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F7D0@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-10-14 09:56, Morten Brørup wrote: >> From: Jerin Jacob [mailto:jerinjacobk@gmail.com] >> Sent: Wednesday, 18 September 2024 12.12 >> >> On Thu, Sep 12, 2024 at 8:52 PM Jerin Jacob >> wrote: >>> >>> On Thu, Sep 12, 2024 at 7:11 PM Morten Brørup >> wrote: >>>> >>>>> From: Jerin Jacob [mailto:jerinjacobk@gmail.com] >>>>> Sent: Thursday, 12 September 2024 15.17 >>>>> >>>>> On Thu, Sep 12, 2024 at 2:40 PM Morten Brørup >> >>>>> wrote: >>>>>> >>>>>>> +#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * >> RTE_MAX_LCORE) >>>>>> >>>>>> Considering hugepages... >>>>>> >>>>>> Lcore variables may be allocated before DPDK's memory allocator >>>>> (rte_malloc()) is ready, so rte_malloc() cannot be used for lcore >> variables. >>>>>> >>>>>> And lcore variables are not usable (shared) for DPDK multi- >> process, so the >>>>> lcore_buffer could be allocated through the O/S APIs as anonymous >> hugepages, >>>>> instead of using rte_malloc(). >>>>>> >>>>>> The alternative, using rte_malloc(), would disallow allocating >> lcore >>>>> variables before DPDK's memory allocator has been initialized, >> which I think >>>>> is too late. >>>>> >>>>> I thought it is not. A lot of the subsystems are initialized >> after the >>>>> memory subsystem is initialized. >>>>> [1] example given in documentation. I thought, RTE_INIT needs to >>>>> replaced if the subsystem called after memory initialized (which >> is >>>>> the case for most of the libraries) >>>> >>>> The list of RTE_INIT functions are called before main(). It is not >> very useful. >>>> >>>> Yes, it would be good to replace (or supplement) RTE_INIT_PRIO by >> something similar, which calls the list of "INIT" functions at the >> appropriate time during EAL initialization. >>>> >>>> DPDK should then use this "INIT" list for all its initialization, >> so the init function of new features (such as this, and trace) can be >> inserted at the correct location in the list. >>>> >>>>> Trace library had a similar situation. It is managed like [2] >>>> >>>> Yes, if we insist on using rte_malloc() for lcore variables, the >> alternative is to prohibit establishing lcore variables in functions >> called through RTE_INIT. >>> >>> I was not insisting on using ONLY rte_malloc(). Since rte_malloc() >> can >>> be called before rte_eal_init)(it will return NULL). Alloc routine >> can >>> check first rte_malloc() is available if not switch over glibc. >> >> >> @Mattias Rönnblom This comment is not addressed in v7. Could you check? > > Mattias, following up on Jerin's suggestion: > > When allocating an lcore variable, and the buffer holding lcore variables is out of space (or was never allocated), a new buffer is allocated. > > Here's the twist I think Jerin is asking for: > You could check if rte_malloc() is available, and use that (instead of the heap) when allocating a new buffer holding lcore variables. > This check can be performed (aggressively) when allocating a new lcore variable, or (conservatively) only when allocating a new buffer. > > > Now, if using hugepages, the value of RTE_MAX_LCORE_VAR (the maximum size of one lcore variable instance) becomes more important. > > Let's consider systems with 2 MB hugepages: > > If it supports two lcores (RTE_MAX_LCORE is 2), the current RTE_MAX_LCORE_VAR default of 1 MB is a perfect match; it will use 2 MB of RAM as one 2 MB hugepage. > > If it supports 128 lcores, the current RTE_MAX_LCORE_VAR default of 1 MB will use 128 MB of RAM. > > If we scale it back, so it only uses one 2 MB hugepage, RTE_MAX_LCORE_VAR will have to be 2 MB / 128 lcores = 16 KB. > 16 KB might be too small. E.g. a mempool cache uses 2 * 512 * sizeof(void *) = 8 KB + a few bytes for the information about the cache. So I can easily point at one example where 16 KB is going very close to the edge. > > So, as you already asked, what is a reasonable default minimum value of RTE_MAX_LCORE_VAR? > > Maybe we should just stick with your initial suggestion (1 MB) and see how it goes. > Sure. Let's stick with 1 MB. I'm guessing that if/when someone takes a closer look how to do per-lcore *dynamic* allocations, this API and its implementation will be revisited as well. > > > At the recent DPDK Summit, we discussed memory consumption in one of the workshops. > One of the possible means for reducing memory consumption is making RTE_MAX_LCORE dynamic, so an application using only a few cores will scale its per-lcore tables to the actual number of lcores, instead of scaling to some hardcoded maximum. > > With this in mind, I'm less worried about the RTE_MAX_LCORE multiplier. > > A interesting hack would be disable huge page usage, set up a swap file in a zram device, and then MADV_PAGEOUT the DPDK process after startup. I wonder how much smaller DPDK process RSS would be, when it had paged back in all the pages that were actually required.