From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <i.maximets@samsung.com>
Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com
 [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 8222CFB0A;
 Mon, 27 Mar 2017 16:43:20 +0200 (CEST)
Received: from eucas1p2.samsung.com (unknown [182.198.249.207])
 by mailout1.w1.samsung.com
 (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with ESMTP id <0ONH008AFA86V810@mailout1.w1.samsung.com>; Mon,
 27 Mar 2017 15:43:18 +0100 (BST)
Received: from eusmges5.samsung.com (unknown [203.254.199.245])
 by	eucas1p1.samsung.com (KnoxPortal) with ESMTP
 id	20170327144318eucas1p103855acb633a42a892bb0ae6a4d4c627~vxEQIpgLW2998229982eucas1p1d;
 Mon, 27 Mar 2017 14:43:18 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206])
 by	eusmges5.samsung.com (EUCPMTA) with SMTP id 48.96.25577.50529D85; Mon,
 27	Mar 2017 15:43:17 +0100 (BST)
Received: from eusmgms1.samsung.com (unknown [182.198.249.179])
 by	eucas1p1.samsung.com (KnoxPortal) with ESMTP
 id	20170327144317eucas1p128395d538d2f284cdd1e71f83823c965~vxEPZxRzv1631416314eucas1p1T;
 Mon, 27 Mar 2017 14:43:17 +0000 (GMT)
X-AuditID: cbfec7f5-f792f6d0000063e9-86-58d925050a0c
Received: from eusync4.samsung.com ( [203.254.199.214])
 by	eusmgms1.samsung.com (EUCPMTA) with SMTP id E0.9D.17452.47529D85; Mon,
 27	Mar 2017 15:45:08 +0100 (BST)
Received: from [106.109.129.180] by eusync4.samsung.com
 (Oracle	Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with	ESMTPA id <0ONH007J1A840Q70@eusync4.samsung.com>; Mon,
 27 Mar 2017 15:43:17	+0100 (BST)
To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>, dev@dpdk.org,
 David Marchand <david.marchand@6wind.com>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Jianfeng Tan <jianfeng.tan@intel.com>, Neil Horman <nhorman@tuxdriver.com>,
 Yulong Pei <yulong.pei@intel.com>, stable@dpdk.org,
 Thomas Monjalon <thomas.monjalon@6wind.com>,
 Bruce Richardson <bruce.richardson@intel.com>
From: Ilya Maximets <i.maximets@samsung.com>
Message-id: <d90f300a-dc8c-8788-d3ef-7970a6d36508@samsung.com>
Date: Mon, 27 Mar 2017 17:43:15 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.0
MIME-version: 1.0
In-reply-to: <077682cf-8534-7890-9453-7c9e822bd3e6@intel.com>
Content-type: text/plain; charset=windows-1252
Content-transfer-encoding: 7bit
X-Brightmail-Tracker: H4sIAAAAAAAAA02SbUhTYRTHfe7d7u6Wi9u0OlghDUwoMwXNq5SaKOxDgdIHpS868qKWU9lV
 0YiQFLesVLbQmPhSspQtTWxMRQXf0pxlZr5NRYWCzFKqTW0l0uad4Lff/5zznHP+h4fEJW18
 HzIjK5dRZskzpYSIZx52jJ/n+1mTghxzF+k5QxTdrFbx6I3fHRhdZVsQ0FMqh4B+WGMn6Pni
 UYJurhxC9K56R0Db26sJerZygk9vrWzj0Z6yjxWLSPa3Qc+XNfZ8w2R1owmycpMByaa2jSie
 uCG6lMpkZuQzyguRKaL06qYePGfQv+DNdhVehDS+ZUhIAhUCO+v9GMfHYGLpFVGGRKSE0iP4
 Yfwk4IQNwRddP2//RYVtl8clXiCob57EOLGK4GfLs71eXlQEmMfK99ibKoDaIQ3hYpzqxGDR
 JHUxQQWAxTiEXCymImHweZPAxTzKD/5tft2LH6WSoHPuEcHVHIE/2qW9LYTUZWho3cC5nkHw
 9MkgxrEvvH65jrsWAmpcAN29KqcgneIUtPfhnINY6J1ZcrMXrI2YBByfhAfqfox7W4ygyDCJ
 OFGJwF7c5L5SFFis0+5ph0FjrnYPEIO6VMKVyGCgTU9wfAU+f3jsvuMaBu/qSlEl8tUdMKQ7
 YEJ3wEQDwg3Im8ljFWkMGxrIyhVsXlZa4M1sRTty/qOx3ZHNTqQfjhhAFImknuIJT2uShC/P
 ZwsVAwhIXOot3jrkDIlT5YV3GGV2sjIvk2EH0AmSJz0uFllmkiRUmjyXuc0wOYxyP4uRQp8i
 5HGrSttiSrfcDzG9Dx8Oa4mxlizHBUdGz1foA3e6G6+HKox3+2IWtCVXE+vjp691dZuDdfnq
 uC5NmzA85Z6tMMz/NBI2Zi2fM2/g2pVmy9vkDPS9plaVUlO4UjDTYY8VJ4fmROoSEhVn8NaZ
 Lsev+bIQlhe1kGKTzAasemRKeWy6PPgsrmTl/wH0u4igQwMAAA==
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsVy+t/xa7olqjcjDGa2mlrcWGVvsaKjncXi
 3aftTBbTPt9mt7jS/pPdonv2FzaLW80n2SxWTDjCaPGv4w+7xZdN09ksrk+4wGrx7cF3Zgce
 j4v9dxg9fi1YyuqxeM9LJo95JwM9+rasYvS48n01YwBblJtNRmpiSmqRQmpecn5KZl66rVJo
 iJuuhZJCXmJuqq1ShK5vSJCSQlliTimQZ2SABhycA9yDlfTtEtwypi/fw1xwWL3i6PdpzA2M
 k+S7GDk5JARMJPo//2OBsMUkLtxbzwZiCwksYZRYf628i5ELyH7BKPHk6gmwImEBK4ltp/uY
 QGwRgQqJ/7e3sUEUvWGS+DRzO1iCWWAXk8SWqykgNpuAjsSp1UcYQWxeATuJw4uWs4PYLAKq
 Er+/PgeLiwpESMx/uooJokZQ4sfke2DLOAVsJRase8fcxcgBNFNP4v5FLYjx8hKb17xlnsAo
 MAtJxyyEqllIqhYwMq9iFEktLc5Nzy021CtOzC0uzUvXS87P3cQIjMltx35u3sF4aWPwIUYB
 DkYlHl4N/psRQqyJZcWVuYcYJTiYlUR4v3EDhXhTEiurUovy44tKc1KLDzGaAr0wkVlKNDkf
 mC7ySuINTQzNLQ2NjC0szI2MlMR5Sz5cCRcSSE8sSc1OTS1ILYLpY+LglGpg9Cm3tmw4asCy
 wV/o+YvvswN7rJ83RJ47/VL91s35+8scPa47LfhyXrXv2JWVyySOpT2rKzrFfpp/q/c+s7UH
 PucuaXb78tQxT/TZjXeqLCHSScdLTxkkHNfr+rgoZMOFusy6itlFPffUChkq82JsjbR49A7m
 C59n2Wjx+tX2zOX+v+Oqo3LClFiKMxINtZiLihMBcKeGF98CAAA=
X-MTR: 20000000000000000@CPGS
X-CMS-MailID: 20170327144317eucas1p128395d538d2f284cdd1e71f83823c965
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.179
X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?=
X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-HopCount: 7
X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
References: <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
 <1487250070-13973-1-git-send-email-i.maximets@samsung.com>
 <50517d4c-5174-f4b2-e77e-143f7aac2c00@samsung.com>
 <f50d0fa1-9530-436c-d532-0e6123f4e06d@intel.com>
 <aca5b73b-75d9-da12-26f3-67ff6fe218ac@samsung.com>
 <077682cf-8534-7890-9453-7c9e822bd3e6@intel.com>
Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Mar 2017 14:43:21 -0000

On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote:
> On 09/03/2017 12:57, Ilya Maximets wrote:
>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote:
>>> Hi Ilya,
>>>
>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected.
>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point.
>>>
>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable.
>>>
>>> Currently at a high level regarding hugepages per numa node:
>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point.
>>> 2) Find out numa node of each hugepage.
>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node.
>>>
>>> Using libnuma we could try to allocate hugepages per numa:
>>> 1) Try to map as many hugepages from numa 0.
>>> 2) Check if we have enough hugepages for requested memory in numa 0.
>>> 3) Try to map as many hugepages from numa 1.
>>> 4) Check if we have enough hugepages for requested memory in numa 1.
>>>
>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg).
>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory.
>>>
>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa.
>>>
>>> Thoughts?
>> Hi Sergio,
>>
>> Thanks for your attention to this.
>>
>> For now, as we have some issues with non-contiguous
>> hugepages, I'm thinking about following hybrid schema:
>> 1) Allocate essential hugepages:
>>     1.1) Allocate as many hugepages from numa N to
>>          only fit requested memory for this numa.
>>     1.2) repeat 1.1 for all numa nodes.
>> 2) Try to map all remaining free hugepages in a round-robin
>>     fashion like in this patch.
>> 3) Sort pages and choose the most suitable.
>>
>> This solution should decrease number of issues connected with
>> non-contiguous memory.
> 
> Sorry for late reply, I was hoping for more comments from the community.
> 
> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency.
> I think your proposal is good, could you consider implementing such approach on next release?

Sure, I can implement this for 17.08 release.

>>
>>> On 06/03/2017 09:34, Ilya Maximets wrote:
>>>> Hi all.
>>>>
>>>> So, what about this change?
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>> On 16.02.2017 16:01, Ilya Maximets wrote:
>>>>> Currently EAL allocates hugepages one by one not paying
>>>>> attention from which NUMA node allocation was done.
>>>>>
>>>>> Such behaviour leads to allocation failure if number of
>>>>> available hugepages for application limited by cgroups
>>>>> or hugetlbfs and memory requested not only from the first
>>>>> socket.
>>>>>
>>>>> Example:
>>>>>      # 90 x 1GB hugepages availavle in a system
>>>>>
>>>>>      cgcreate -g hugetlb:/test
>>>>>      # Limit to 32GB of hugepages
>>>>>      cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
>>>>>      # Request 4GB from each of 2 sockets
>>>>>      cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>>>>>
>>>>>      EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
>>>>>      EAL: 32 not 90 hugepages of size 1024 MB allocated
>>>>>      EAL: Not enough memory available on socket 1!
>>>>>           Requested: 4096MB, available: 0MB
>>>>>      PANIC in rte_eal_init():
>>>>>      Cannot init memory
>>>>>
>>>>>      This happens beacause all allocated pages are
>>>>>      on socket 0.
>>>>>
>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each
>>>>> hugepage to one of requested nodes in a round-robin fashion.
>>>>> In this case all allocated pages will be fairly distributed
>>>>> between all requested nodes.
>>>>>
>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
>>>>> introduced and disabled by default because of external
>>>>> dependency from libnuma.
>>>>>
>>>>> Cc:<stable@dpdk.org>
>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>>>>>
>>>>> Signed-off-by: Ilya Maximets<i.maximets@samsung.com>
>>>>> ---
>>>>>    config/common_base                       |  1 +
>>>>>    lib/librte_eal/Makefile                  |  4 ++
>>>>>    lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++
>>>>>    mk/rte.app.mk                            |  3 ++
>>>>>    4 files changed, 74 insertions(+)
> 
> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>

Thanks.

Best regards, Ilya Maximets.