From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <i.maximets@samsung.com>
Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com
 [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 8446F3DC;
 Thu,  9 Mar 2017 13:57:30 +0100 (CET)
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
 by mailout1.w1.samsung.com
 (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with ESMTP id <0OMJ00KF8TBRZT50@mailout1.w1.samsung.com>; Thu,
 09 Mar 2017 12:57:27 +0000 (GMT)
Received: from eusmges5.samsung.com (unknown [203.254.199.245])
 by	eucas1p1.samsung.com (KnoxPortal) with ESMTP
 id	20170309125727eucas1p14a6f51e7c449a746796ab813f00674dc~qOAsubkNN0863508635eucas1p17;
 Thu,  9 Mar 2017 12:57:27 +0000 (GMT)
Received: from eucas1p2.samsung.com ( [182.198.249.207])
 by	eusmges5.samsung.com (EUCPMTA) with SMTP id 1D.98.17477.73151C85; Thu,
 9	Mar 2017 12:57:27 +0000 (GMT)
Received: from eusmgms2.samsung.com (unknown [182.198.249.180])
 by	eucas1p2.samsung.com (KnoxPortal) with ESMTP
 id	20170309125726eucas1p2298e116560b4a71898517265420064b2~qOAsFP8492218722187eucas1p2D;
 Thu,  9 Mar 2017 12:57:26 +0000 (GMT)
X-AuditID: cbfec7f5-f79d06d000004445-14-58c1513726c2
Received: from eusync3.samsung.com ( [203.254.199.213])
 by	eusmgms2.samsung.com (EUCPMTA) with SMTP id 0D.26.10233.14151C85; Thu,
 9	Mar 2017 12:57:37 +0000 (GMT)
Received: from [106.109.129.180] by eusync3.samsung.com
 (Oracle	Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with	ESMTPA id <0OMJ00H53TBPJ020@eusync3.samsung.com>; Thu,
 09 Mar 2017 12:57:26	+0000 (GMT)
To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org,
 David Marchand <david.marchand@6wind.com>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Jianfeng Tan <jianfeng.tan@intel.com>, Neil Horman <nhorman@tuxdriver.com>,
 Yulong Pei <yulong.pei@intel.com>, stable@dpdk.org,
 Thomas Monjalon <thomas.monjalon@6wind.com>,
 Bruce Richardson <bruce.richardson@intel.com>
From: Ilya Maximets <i.maximets@samsung.com>
Message-id: <aca5b73b-75d9-da12-26f3-67ff6fe218ac@samsung.com>
Date: Thu, 09 Mar 2017 15:57:24 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.0
MIME-version: 1.0
In-reply-to: <f50d0fa1-9530-436c-d532-0e6123f4e06d@intel.com>
Content-type: text/plain; charset=windows-1252
Content-transfer-encoding: 7bit
X-Brightmail-Tracker: H4sIAAAAAAAAA02SeUgUYRjG+WZmZ8fFlWk9erVCWgghyqODBitNyFwyIv+QTIhcclDJi1kV
 lShL8T7WK0XJNDNtNYzV1pVUXN0UtTzKvJFMIS0M0vVOzXUU/O/38j7P+77Px0fhklqBDRUU
 GsFyofJgKSkiNB1rfacveOl8HIf6TZkRlStTlZxEMH8WGjDm2eK4kBlMWhMyacUGkhmL7yKZ
 KqUeMVvJ/4SMQV1AMsPKfgGz/H0Fv2IqG8iaQLL10gqBrLxpDpOVdHnJMutVSDa4Uo1ukb6i
 S/5scFAUyzm4+IkCc7MnyPDV49HqD9fjUKNNKjKhgD4Hwz+mMJ6toH+ylkxFIkpCVyDoGRon
 +GIRQX5iKrbvyMrPRnzjNYKsF8o9yyyClE9LhFFlTjuDpidz12FBR8NzfQ5pZJzWYjBRLzUy
 SZ+C7mo9MrKYdoGB+vVdJugTUFlcIzCyJe0D2pF0ktccgtXcyd35JvRlyFcvI36mIxTmtWM8
 20JdzTxuPAjoDiGkqJt3RNROcQzUrTif4CqMJ7xFPJvDr856Ic9H4WtuGsF74xHEqb4gvlAi
 MMRX7uV3he7Rb3vbzCBHU4DzC8SQnCjhJTJoe1dB8uwG030ZQv6FnmLwpGVTqES2RQcCFR0I
 UXQgRCnCVciCjVSEBLCK8/YKeYgiMjTA/n5YiBrtfKOerc4lLarocG5DNIWkpmIt3eojEcij
 FDEhbQgoXGohjr2p85GI/eUxsSwXdo+LDGYVbegIRUgPi5tKB29L6AB5BPuAZcNZbr+LUSY2
 ccj3vf013bZvb2HetkFt/bMmtqFYk95bdvGvb1SG/pH776xmaYudp6VZ93QC7mZ3dybwzubG
 6BsP7nHhw9Sx8qmPSvNWobbdpmQ1Eb8h6ujOwctI95dSz+UlhznvOs5szOnVqq7RZH7GT+Vx
 ZuVziaXlwpT12Q1TrFUwa5XgPVgnJRSBcqeTOKeQ/wePGVlCQgMAAA==
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJIsWRmVeSWpSXmKPExsVy+t/xq7qOgQcjDLas07e4screYkVHO4vF
 u0/bmSymfb7NbnGl/Se7RffsL2wWt5pPslmsmHCE0eJfxx92iy+bprNZXJ9wgdXi24PvzA48
 Hhf77zB6/FqwlNVj8Z6XTB7zTgZ69G1Zxehx5ftqxgC2KDebjNTElNQihdS85PyUzLx0W6XQ
 EDddCyWFvMTcVFulCF3fkCAlhbLEnFIgz8gADTg4B7gHK+nbJbhlTJ54h63gh2LFpt1eDYw7
 pboYOTkkBEwk+qdOZISwxSQu3FvPBmILCSxhlPi/1h/CfsEoMbVPCMQWFrCS2Ha6jwnEFhGo
 kPh/extUfQuTxNalwiA2s8AuJoktV1NAbDYBHYlTq4+AzecVsJO4uOUXmM0ioCqxfPYaVhBb
 VCBCYv7TVUwQNYISPybfYwGxOQVsJaZu+gZUzwE0U0/i/kUtiPHyEpvXvGWewCgwC0nHLISq
 WUiqFjAyr2IUSS0tzk3PLTbSK07MLS7NS9dLzs/dxAiMxm3Hfm7Zwdj1LvgQowAHoxIP7w6B
 AxFCrIllxZW5hxglOJiVRHir/A5GCPGmJFZWpRblxxeV5qQWH2I0BXphIrOUaHI+MFHklcQb
 mhiaWxoaGVtYmBsZKYnzTv1wJVxIID2xJDU7NbUgtQimj4mDU6qB8ZLQ4k8PtT5H+r/b8+By
 fcFUjQU2FSd1ODkYE+XEc3uWHnv6p8eHT++2XpvWSvFjn50/3Mt/eiIle1710+wIS6XSaWtn
 nigqUslcXHfK2ljC7W31FtWVytMPBEQ2nDi+x+x4Z3SBqZ/BxN2r/qrnS0/+7PxGu+DM5PDV
 dxMMmD6pLv/0+czB00osxRmJhlrMRcWJAGCdM0XcAgAA
X-MTR: 20000000000000000@CPGS
X-CMS-MailID: 20170309125726eucas1p2298e116560b4a71898517265420064b2
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.180
X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?=
X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-HopCount: 7
X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
References: <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
 <1487250070-13973-1-git-send-email-i.maximets@samsung.com>
 <50517d4c-5174-f4b2-e77e-143f7aac2c00@samsung.com>
 <f50d0fa1-9530-436c-d532-0e6123f4e06d@intel.com>
Subject: Re: [dpdk-dev] [PATCH] mem: balanced allocation of hugepages
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 12:57:31 -0000

On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote:
> Hi Ilya,
> 
> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected.
> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point.
> 
> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable.
> 
> Currently at a high level regarding hugepages per numa node:
> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point.
> 2) Find out numa node of each hugepage.
> 3) Check if we have enough hugepages for requested memory in each numa socket/node.
> 
> Using libnuma we could try to allocate hugepages per numa:
> 1) Try to map as many hugepages from numa 0.
> 2) Check if we have enough hugepages for requested memory in numa 0.
> 3) Try to map as many hugepages from numa 1.
> 4) Check if we have enough hugepages for requested memory in numa 1.
> 
> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg).
> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory.
> 
> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa.
> 
> Thoughts?

Hi Sergio,

Thanks for your attention to this.

For now, as we have some issues with non-contiguous
hugepages, I'm thinking about following hybrid schema:
1) Allocate essential hugepages:
	1.1) Allocate as many hugepages from numa N to
	     only fit requested memory for this numa.
	1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
   fashion like in this patch.
3) Sort pages and choose the most suitable.

This solution should decrease number of issues connected with
non-contiguous memory.

Best regards, Ilya Maximets.

> 
> On 06/03/2017 09:34, Ilya Maximets wrote:
>> Hi all.
>>
>> So, what about this change?
>>
>> Best regards, Ilya Maximets.
>>
>> On 16.02.2017 16:01, Ilya Maximets wrote:
>>> Currently EAL allocates hugepages one by one not paying
>>> attention from which NUMA node allocation was done.
>>>
>>> Such behaviour leads to allocation failure if number of
>>> available hugepages for application limited by cgroups
>>> or hugetlbfs and memory requested not only from the first
>>> socket.
>>>
>>> Example:
>>>     # 90 x 1GB hugepages availavle in a system
>>>
>>>     cgcreate -g hugetlb:/test
>>>     # Limit to 32GB of hugepages
>>>     cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
>>>     # Request 4GB from each of 2 sockets
>>>     cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>>>
>>>     EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
>>>     EAL: 32 not 90 hugepages of size 1024 MB allocated
>>>     EAL: Not enough memory available on socket 1!
>>>          Requested: 4096MB, available: 0MB
>>>     PANIC in rte_eal_init():
>>>     Cannot init memory
>>>
>>>     This happens beacause all allocated pages are
>>>     on socket 0.
>>>
>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each
>>> hugepage to one of requested nodes in a round-robin fashion.
>>> In this case all allocated pages will be fairly distributed
>>> between all requested nodes.
>>>
>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
>>> introduced and disabled by default because of external
>>> dependency from libnuma.
>>>
>>> Cc: <stable@dpdk.org>
>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>> ---
>>>   config/common_base                       |  1 +
>>>   lib/librte_eal/Makefile                  |  4 ++
>>>   lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++
>>>   mk/rte.app.mk                            |  3 ++
>>>   4 files changed, 74 insertions(+)
>>>
> 
> 
> 
>