From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 8446F3DC; Thu, 9 Mar 2017 13:57:30 +0100 (CET) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OMJ00KF8TBRZT50@mailout1.w1.samsung.com>; Thu, 09 Mar 2017 12:57:27 +0000 (GMT) Received: from eusmges5.samsung.com (unknown [203.254.199.245]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170309125727eucas1p14a6f51e7c449a746796ab813f00674dc~qOAsubkNN0863508635eucas1p17; Thu, 9 Mar 2017 12:57:27 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges5.samsung.com (EUCPMTA) with SMTP id 1D.98.17477.73151C85; Thu, 9 Mar 2017 12:57:27 +0000 (GMT) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170309125726eucas1p2298e116560b4a71898517265420064b2~qOAsFP8492218722187eucas1p2D; Thu, 9 Mar 2017 12:57:26 +0000 (GMT) X-AuditID: cbfec7f5-f79d06d000004445-14-58c1513726c2 Received: from eusync3.samsung.com ( [203.254.199.213]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id 0D.26.10233.14151C85; Thu, 9 Mar 2017 12:57:37 +0000 (GMT) Received: from [106.109.129.180] by eusync3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OMJ00H53TBPJ020@eusync3.samsung.com>; Thu, 09 Mar 2017 12:57:26 +0000 (GMT) To: Sergio Gonzalez Monroy , dev@dpdk.org, David Marchand Cc: Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Thomas Monjalon , Bruce Richardson From: Ilya Maximets Message-id: Date: Thu, 09 Mar 2017 15:57:24 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-version: 1.0 In-reply-to: Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA02SeUgUYRjG+WZmZ8fFlWk9erVCWgghyqODBitNyFwyIv+QTIhcclDJi1kV lShL8T7WK0XJNDNtNYzV1pVUXN0UtTzKvJFMIS0M0vVOzXUU/O/38j7P+77Px0fhklqBDRUU GsFyofJgKSkiNB1rfacveOl8HIf6TZkRlStTlZxEMH8WGjDm2eK4kBlMWhMyacUGkhmL7yKZ KqUeMVvJ/4SMQV1AMsPKfgGz/H0Fv2IqG8iaQLL10gqBrLxpDpOVdHnJMutVSDa4Uo1ukb6i S/5scFAUyzm4+IkCc7MnyPDV49HqD9fjUKNNKjKhgD4Hwz+mMJ6toH+ylkxFIkpCVyDoGRon +GIRQX5iKrbvyMrPRnzjNYKsF8o9yyyClE9LhFFlTjuDpidz12FBR8NzfQ5pZJzWYjBRLzUy SZ+C7mo9MrKYdoGB+vVdJugTUFlcIzCyJe0D2pF0ktccgtXcyd35JvRlyFcvI36mIxTmtWM8 20JdzTxuPAjoDiGkqJt3RNROcQzUrTif4CqMJ7xFPJvDr856Ic9H4WtuGsF74xHEqb4gvlAi MMRX7uV3he7Rb3vbzCBHU4DzC8SQnCjhJTJoe1dB8uwG030ZQv6FnmLwpGVTqES2RQcCFR0I UXQgRCnCVciCjVSEBLCK8/YKeYgiMjTA/n5YiBrtfKOerc4lLarocG5DNIWkpmIt3eojEcij FDEhbQgoXGohjr2p85GI/eUxsSwXdo+LDGYVbegIRUgPi5tKB29L6AB5BPuAZcNZbr+LUSY2 ccj3vf013bZvb2HetkFt/bMmtqFYk95bdvGvb1SG/pH776xmaYudp6VZ93QC7mZ3dybwzubG 6BsP7nHhw9Sx8qmPSvNWobbdpmQ1Eb8h6ujOwctI95dSz+UlhznvOs5szOnVqq7RZH7GT+Vx ZuVziaXlwpT12Q1TrFUwa5XgPVgnJRSBcqeTOKeQ/wePGVlCQgMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJIsWRmVeSWpSXmKPExsVy+t/xq7qOgQcjDLas07e4screYkVHO4vF u0/bmSymfb7NbnGl/Se7RffsL2wWt5pPslmsmHCE0eJfxx92iy+bprNZXJ9wgdXi24PvzA48 Hhf77zB6/FqwlNVj8Z6XTB7zTgZ69G1Zxehx5ftqxgC2KDebjNTElNQihdS85PyUzLx0W6XQ EDddCyWFvMTcVFulCF3fkCAlhbLEnFIgz8gADTg4B7gHK+nbJbhlTJ54h63gh2LFpt1eDYw7 pboYOTkkBEwk+qdOZISwxSQu3FvPBmILCSxhlPi/1h/CfsEoMbVPCMQWFrCS2Ha6jwnEFhGo kPh/extUfQuTxNalwiA2s8AuJoktV1NAbDYBHYlTq4+AzecVsJO4uOUXmM0ioCqxfPYaVhBb VCBCYv7TVUwQNYISPybfYwGxOQVsJaZu+gZUzwE0U0/i/kUtiPHyEpvXvGWewCgwC0nHLISq WUiqFjAyr2IUSS0tzk3PLTbSK07MLS7NS9dLzs/dxAiMxm3Hfm7Zwdj1LvgQowAHoxIP7w6B AxFCrIllxZW5hxglOJiVRHir/A5GCPGmJFZWpRblxxeV5qQWH2I0BXphIrOUaHI+MFHklcQb mhiaWxoaGVtYmBsZKYnzTv1wJVxIID2xJDU7NbUgtQimj4mDU6qB8ZLQ4k8PtT5H+r/b8+By fcFUjQU2FSd1ODkYE+XEc3uWHnv6p8eHT++2XpvWSvFjn50/3Mt/eiIle1710+wIS6XSaWtn nigqUslcXHfK2ljC7W31FtWVytMPBEQ2nDi+x+x4Z3SBqZ/BxN2r/qrnS0/+7PxGu+DM5PDV dxMMmD6pLv/0+czB00osxRmJhlrMRcWJAGCdM0XcAgAA X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170309125726eucas1p2298e116560b4a71898517265420064b2 X-Msg-Generator: CA X-Sender-IP: 182.198.249.180 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a References: <1487250070-13973-1-git-send-email-i.maximets@samsung.com> <50517d4c-5174-f4b2-e77e-143f7aac2c00@samsung.com> Subject: Re: [dpdk-dev] [PATCH] mem: balanced allocation of hugepages X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 12:57:31 -0000 On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: > Hi Ilya, > > I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. > I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. > > I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. > > Currently at a high level regarding hugepages per numa node: > 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. > 2) Find out numa node of each hugepage. > 3) Check if we have enough hugepages for requested memory in each numa socket/node. > > Using libnuma we could try to allocate hugepages per numa: > 1) Try to map as many hugepages from numa 0. > 2) Check if we have enough hugepages for requested memory in numa 0. > 3) Try to map as many hugepages from numa 1. > 4) Check if we have enough hugepages for requested memory in numa 1. > > This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). > The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. > > Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. > > Thoughts? Hi Sergio, Thanks for your attention to this. For now, as we have some issues with non-contiguous hugepages, I'm thinking about following hybrid schema: 1) Allocate essential hugepages: 1.1) Allocate as many hugepages from numa N to only fit requested memory for this numa. 1.2) repeat 1.1 for all numa nodes. 2) Try to map all remaining free hugepages in a round-robin fashion like in this patch. 3) Sort pages and choose the most suitable. This solution should decrease number of issues connected with non-contiguous memory. Best regards, Ilya Maximets. > > On 06/03/2017 09:34, Ilya Maximets wrote: >> Hi all. >> >> So, what about this change? >> >> Best regards, Ilya Maximets. >> >> On 16.02.2017 16:01, Ilya Maximets wrote: >>> Currently EAL allocates hugepages one by one not paying >>> attention from which NUMA node allocation was done. >>> >>> Such behaviour leads to allocation failure if number of >>> available hugepages for application limited by cgroups >>> or hugetlbfs and memory requested not only from the first >>> socket. >>> >>> Example: >>> # 90 x 1GB hugepages availavle in a system >>> >>> cgcreate -g hugetlb:/test >>> # Limit to 32GB of hugepages >>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>> # Request 4GB from each of 2 sockets >>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>> >>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>> EAL: Not enough memory available on socket 1! >>> Requested: 4096MB, available: 0MB >>> PANIC in rte_eal_init(): >>> Cannot init memory >>> >>> This happens beacause all allocated pages are >>> on socket 0. >>> >>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>> hugepage to one of requested nodes in a round-robin fashion. >>> In this case all allocated pages will be fairly distributed >>> between all requested nodes. >>> >>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>> introduced and disabled by default because of external >>> dependency from libnuma. >>> >>> Cc: >>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>> >>> Signed-off-by: Ilya Maximets >>> --- >>> config/common_base | 1 + >>> lib/librte_eal/Makefile | 4 ++ >>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ >>> mk/rte.app.mk | 3 ++ >>> 4 files changed, 74 insertions(+) >>> > > > >