From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by dpdk.org (Postfix) with ESMTP id 9B9D8695C for ; Fri, 7 Apr 2017 17:44:58 +0200 (CEST) Received: by mail-wm0-f46.google.com with SMTP id t189so10691827wmt.1 for ; Fri, 07 Apr 2017 08:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:user-agent:in-reply-to :references:mime-version:content-transfer-encoding; bh=R6FtsrLLhiS1LNSC2T51qBQcjm9M5nPX8YQcBYqC5tY=; b=O9t5E9Wc4uc6gu5EpVD+IKUB4QhlGbmYa48IWlPak13G8iuj/eUHW3n61QsRXVUW7m kpom8oPtVVZb/6gO/zdDqZdDYWmIOpJl/ykKjsnju1zWxaCPeoEc7abA0Y2PJ8gWw/l5 y3N37hc0Xq5V145pm8QrtKRVIf7wi+3563zoJQIqy+RwGs+ddfFZ3or6WKUefbhEr3Tl T6/oCuzGVIjZtcQ66VqzV2nFrF5FyC0BmYN98dtX8YYHEOHOebjtvbtK7kbg5XR472T8 /PdpsMIpc4/Q6n49PTLqzszOhbFQcasUaHLFqGrHgdKX5wUPY/ImQbGE00jbNuO26ufJ AT7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:user-agent :in-reply-to:references:mime-version:content-transfer-encoding; bh=R6FtsrLLhiS1LNSC2T51qBQcjm9M5nPX8YQcBYqC5tY=; b=PQ5MGw6/9JyUUR/XvD1iF7n/8KH7vRZ2Y1B6CgEFkN4Y7Pferex6lH6al37MBIcmTj rq+ZIcXaxw8z9Y84XV9Ju8IPX8dF0ZkFbOdEp6Ha2eUupqYRAuRcsEvNTgeIIs3OsFk+ huulMGCz17lJdLzwdY6/B6EJoD+mJUFcO3QTZ5VcKsqZXZu3hLesDtWxTPOLF7+eTFIO irEMbtv74EcdhVbMFdYI64w6Bw4MPFDDAcm4+NBOEMf/eqQheBIBoeFzJQSsuMPnPRzF DzGHINHrVDlJm8VLejLQBoBzSFbWzzC65uV+JONx5IIylryTcvI/pQ4b3+GcITVAVmRd fPzw== X-Gm-Message-State: AFeK/H3SwBo1/qWR1NqlX0ZEguY1bnFaCK4vGCUObZLZJeY3BlydWaAYlSw+CfanaadTNBtx X-Received: by 10.28.139.195 with SMTP id n186mr30692508wmd.139.1491579898003; Fri, 07 Apr 2017 08:44:58 -0700 (PDT) Received: from xps13.localnet (184.203.134.77.rev.sfr.net. [77.134.203.184]) by smtp.gmail.com with ESMTPSA id y190sm30631293wmy.15.2017.04.07.08.44.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Apr 2017 08:44:57 -0700 (PDT) From: Thomas Monjalon To: Ilya Maximets , Sergio Gonzalez Monroy Cc: dev@dpdk.org, David Marchand , Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Bruce Richardson Date: Fri, 07 Apr 2017 17:44:56 +0200 Message-ID: <1945759.SoJb5dzy87@xps13> User-Agent: KMail/4.14.10 (Linux/4.5.4-1-ARCH; KDE/4.14.11; x86_64; ; ) In-Reply-To: <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> References: <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Apr 2017 15:44:58 -0000 2017-04-07 18:14, Ilya Maximets: > Hi All. > > I wanted to ask (just to clarify current status): > Will this patch be included in current release (acked by maintainer) > and then I will upgrade it to hybrid logic or I will just prepare v3 > with hybrid logic for 17.08 ? What is your preferred option Ilya? Sergio? > On 27.03.2017 17:43, Ilya Maximets wrote: > > On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: > >> On 09/03/2017 12:57, Ilya Maximets wrote: > >>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: > >>>> Hi Ilya, > >>>> > >>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. > >>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. > >>>> > >>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. > >>>> > >>>> Currently at a high level regarding hugepages per numa node: > >>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. > >>>> 2) Find out numa node of each hugepage. > >>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node. > >>>> > >>>> Using libnuma we could try to allocate hugepages per numa: > >>>> 1) Try to map as many hugepages from numa 0. > >>>> 2) Check if we have enough hugepages for requested memory in numa 0. > >>>> 3) Try to map as many hugepages from numa 1. > >>>> 4) Check if we have enough hugepages for requested memory in numa 1. > >>>> > >>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). > >>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. > >>>> > >>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. > >>>> > >>>> Thoughts? > >>> Hi Sergio, > >>> > >>> Thanks for your attention to this. > >>> > >>> For now, as we have some issues with non-contiguous > >>> hugepages, I'm thinking about following hybrid schema: > >>> 1) Allocate essential hugepages: > >>> 1.1) Allocate as many hugepages from numa N to > >>> only fit requested memory for this numa. > >>> 1.2) repeat 1.1 for all numa nodes. > >>> 2) Try to map all remaining free hugepages in a round-robin > >>> fashion like in this patch. > >>> 3) Sort pages and choose the most suitable. > >>> > >>> This solution should decrease number of issues connected with > >>> non-contiguous memory. > >> > >> Sorry for late reply, I was hoping for more comments from the community. > >> > >> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency. > >> I think your proposal is good, could you consider implementing such approach on next release? > > > > Sure, I can implement this for 17.08 release. > > > >>> > >>>> On 06/03/2017 09:34, Ilya Maximets wrote: > >>>>> Hi all. > >>>>> > >>>>> So, what about this change? > >>>>> > >>>>> Best regards, Ilya Maximets. > >>>>> > >>>>> On 16.02.2017 16:01, Ilya Maximets wrote: > >>>>>> Currently EAL allocates hugepages one by one not paying > >>>>>> attention from which NUMA node allocation was done. > >>>>>> > >>>>>> Such behaviour leads to allocation failure if number of > >>>>>> available hugepages for application limited by cgroups > >>>>>> or hugetlbfs and memory requested not only from the first > >>>>>> socket. > >>>>>> > >>>>>> Example: > >>>>>> # 90 x 1GB hugepages availavle in a system > >>>>>> > >>>>>> cgcreate -g hugetlb:/test > >>>>>> # Limit to 32GB of hugepages > >>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test > >>>>>> # Request 4GB from each of 2 sockets > >>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... > >>>>>> > >>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB > >>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated > >>>>>> EAL: Not enough memory available on socket 1! > >>>>>> Requested: 4096MB, available: 0MB > >>>>>> PANIC in rte_eal_init(): > >>>>>> Cannot init memory > >>>>>> > >>>>>> This happens beacause all allocated pages are > >>>>>> on socket 0. > >>>>>> > >>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each > >>>>>> hugepage to one of requested nodes in a round-robin fashion. > >>>>>> In this case all allocated pages will be fairly distributed > >>>>>> between all requested nodes. > >>>>>> > >>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES > >>>>>> introduced and disabled by default because of external > >>>>>> dependency from libnuma. > >>>>>> > >>>>>> Cc: > >>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") > >>>>>> > >>>>>> Signed-off-by: Ilya Maximets > >>>>>> --- > >>>>>> config/common_base | 1 + > >>>>>> lib/librte_eal/Makefile | 4 ++ > >>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ > >>>>>> mk/rte.app.mk | 3 ++ > >>>>>> 4 files changed, 74 insertions(+) > >> > >> Acked-by: Sergio Gonzalez Monroy > > > > Thanks. > > > > Best regards, Ilya Maximets. > >