From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3FEC842B71; Mon, 22 May 2023 14:09:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C010A40EE7; Mon, 22 May 2023 14:09:16 +0200 (CEST) Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by mails.dpdk.org (Postfix) with ESMTP id 44F2540EE5 for ; Mon, 22 May 2023 14:09:15 +0200 (CEST) Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-5eee59b2a5fso2745506d6.1 for ; Mon, 22 May 2023 05:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1684757354; x=1687349354; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Aw5G92965N3H5c9xQog0L2mZdRK5ZgWoMy0lGTJC+Hk=; b=Cg92vFpCc1G0qVtjt6ZiUVGcZhafK5OvRqGKcQrnbO40eGgWoLFEUspA1+SrhEj7ls dfWIZ9FX5E2RV6w/VHjcywmrrML9S/VMheI9tmsHUblnbseOvnMKr9DWTxHw9/804RhW bSPxwSlYMJFQC11apeujKbMLnUBXkOX9eo/1Ateayiaq0kmK7JBFRSCTx2e4GqTxyngE Zz8LDdghJrghYeluxL8EEsMY/IELelaKBI9yCAiFZ++Cjv0Ckk431xQz9wcwyOWbtxgr bvmh9MeuAEKcqh/PMPO39PQls9HFhyd/xLOMMocSthCcahz3ehf18Gu7o/xTyE6HNFaz Ilcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684757354; x=1687349354; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aw5G92965N3H5c9xQog0L2mZdRK5ZgWoMy0lGTJC+Hk=; b=RjVyvqr6cpLm85q3KIhg5EodghXVXbpAH9k+igrExwzDU0gKplqIRDh3fVHnDOUHnx wCkDohTu3Eo/CjBlPHMfMvKI1lNlIw1SKH/UkqjdaBQiEHTs65mfySe4epxlOSl1/Ahc V8673TSrRKzGhyRxwYj1JSP1PxfXdiiCb9kxiVWyV5IEHEvHi9lzNAP965kNhlKEU/rp AlqCfwXK35YE3tmYz9I3eMrz0fTjLnJS8AfgNUcTMxhmJhJ8g5S4lSS97yZttk6fjmzv CbcMZ1haYNGJBVmivW+BbKqa3GB84SJng+cwB7GmNLo4WgX2oMjc7/v37aiHIGFcPCSU PzRQ== X-Gm-Message-State: AC+VfDy5TR57fciOLQrnXx1YOiXrYEFoh7VwiRmPf8q66zOryO+jY7kx UGd0JsUqaNdzfsM+zI4vZgHFF/FU54nda8BmUC2ahA== X-Google-Smtp-Source: ACHHUZ58zt4428MG19y3MrrKB2Re6qkVjWR3a1iE5bSXbi5OhRLsnfLaC9jV15TFtbXl9TrP+jPUcZNgzU8IamUGbS0= X-Received: by 2002:a05:6214:411c:b0:625:88f5:7c62 with SMTP id kc28-20020a056214411c00b0062588f57c62mr1921963qvb.2.1684757354421; Mon, 22 May 2023 05:09:14 -0700 (PDT) MIME-Version: 1.0 References: <20230516122108.38617-1-changfengnan@bytedance.com> In-Reply-To: From: Fengnan Chang Date: Mon, 22 May 2023 20:09:03 +0800 Message-ID: Subject: Re: [External] Re: [PATCH] eal: fix eal init may failed when too much continuous memsegs under legacy mode To: "Burakov, Anatoly" Cc: dev@dpdk.org, Lin Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Burakov, Anatoly =E4=BA=8E2023=E5=B9=B45=E6=9C= =8820=E6=97=A5=E5=91=A8=E5=85=AD 23:03=E5=86=99=E9=81=93=EF=BC=9A > > Hi, > > On 5/16/2023 1:21 PM, Fengnan Chang wrote: > > Under legacy mode, if the number of continuous memsegs greater > > than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though > > another memseg list is empty, because only one memseg list used > > to check in remap_needed_hugepages. > > > > For example: > > hugepage configure: > > 20480 > > 13370 > > 7110 > > > > startup log: > > EAL: Detected memory type: socket_id:0 hugepage_sz:2097152 > > EAL: Detected memory type: socket_id:1 hugepage_sz:2097152 > > EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097= 152 > > EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097= 152 > > EAL: Requesting 13370 pages of size 2MB from socket 0 > > EAL: Requesting 7110 pages of size 2MB from socket 1 > > EAL: Attempting to map 14220M on socket 1 > > EAL: Allocated 14220M on socket 1 > > EAL: Attempting to map 26740M on socket 0 > > EAL: Could not find space for memseg. Please increase 32768 and/or 6553= 6 in > > configuration. > > Unrelated, but this is probably a wrong message, this should've called > out the config options to change, not their values. Sounds like a log > message needs fixing somewhere... In the older version, the log is: EAL: Could not find space for memseg. Please increase CONFIG_RTE_MAX_MEMSEG_PER_TYPE and/or CONFIG_RTE_MAX_MEM_PER_TYPE in configuration. Maybe it's better ? > > > EAL: Couldn't remap hugepage files into memseg lists > > EAL: FATAL: Cannot init memory > > EAL: Cannot init memory > > > > Signed-off-by: Fengnan Chang > > Signed-off-by: Lin Li > > --- > > lib/eal/linux/eal_memory.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c > > index 60fc8cc6ca..36b9e78f5f 100644 > > --- a/lib/eal/linux/eal_memory.c > > +++ b/lib/eal/linux/eal_memory.c > > @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *huge= pages, int n_pages) > > if (cur->size =3D=3D 0) > > break; > > > > + if (cur_page - seg_start_page >=3D RTE_MAX_MEMSEG_PER_LIS= T) > > + new_memseg =3D 1; > > I don't think this is quite right, because technically, > `RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment > lists - larger page sizes segment lists will hit their limits earlier. > So, while this will work for 2MB pages, it won't work for page sizes > which segment list length is smaller than the maximum (such as 1GB pages)= . > > I think this solution could be improved upon by trying to break up the > contiguous area instead. I suspect the core of the issue is not even the > fact that we're exceeding limits of one memseg list, but that we're > always attempting to map exactly N pages in `remap_hugepages`, which > results in us leaving large contiguous zones inside memseg lists unused > because we couldn't satisfy current allocation request and skipped to a > new memseg list. Correct, I didn't consider 1GB pages case, I get your point. Thanks. > > For example, let's suppose we found a large contiguous area that > would've exceeded limits of current memseg list. Sooner or later, this > contiguous area will end, and we'll attempt to remap this virtual area > into a memseg list. Whenever that happens, we call into the remap code, > which will start with first segment, attempt to find exactly N number of > free spots, fail to do so, and skip to the next segment list. > > Thus, sooner or later, if we get contiguous areas that are large enough, > we will not populate our memseg lists but instead skip through them, and > start with a new memseg list every time we need a large contiguous area. > We prioritize having a large contiguous area over using up all of our > memory map. > > If, instead, we could break up the allocation - that is, use > `rte_fbarray_find_biggest_free()` instead of > `rte_fbarray_find_next_n_free()`, and keep doing it until we run out of > segment lists, we will achieve the same result your patch does, but have > it work for all page sizes, because now we would be targeting the actual > issue (under-utilization of memseg lists), not its symptoms (exceeding > segment list limits for large allocations). > > This logic could either be inside `remap_hugepages`, or we could just > return number of pages mapped from `remap_hugepages`, and have the > calling code (`remap_needed_hugepages`) try again, this time with a > different start segment, reflecting how much pages we actually mapped. > IMO this would be easier to implement, as `remap_hugepages` is overly > complex as it is! > > > if (cur_page =3D=3D 0) > > new_memseg =3D 1; > > else if (cur->socket_id !=3D prev->socket_id) > > -- > Thanks, > Anatoly >