From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <i.maximets@samsung.com>
Received: from mailout3.w1.samsung.com (mailout3.w1.samsung.com
 [210.118.77.13]) by dpdk.org (Postfix) with ESMTP id 7C30529C7;
 Mon, 10 Apr 2017 10:06:01 +0200 (CEST)
Received: from eucas1p1.samsung.com (unknown [182.198.249.206])
 by mailout3.w1.samsung.com
 (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with ESMTP id <0OO600ABEP5Z7X60@mailout3.w1.samsung.com>; Mon,
 10 Apr 2017 09:05:59 +0100 (BST)
Received: from eusmges4.samsung.com (unknown [203.254.199.244])
 by	eucas1p2.samsung.com (KnoxPortal) with ESMTP
 id	20170410080559eucas1p20beef154ae10f80212b0658482e02659~z_rWdyshW0671106711eucas1p22;
 Mon, 10 Apr 2017 08:05:59 +0000 (GMT)
Received: from eucas1p1.samsung.com ( [182.198.249.206])
 by	eusmges4.samsung.com (EUCPMTA) with SMTP id 79.29.04729.7EC3BE85; Mon,
 10	Apr 2017 09:05:59 +0100 (BST)
Received: from eusmgms2.samsung.com (unknown [182.198.249.180])
 by	eucas1p2.samsung.com (KnoxPortal) with ESMTP
 id	20170410080558eucas1p2d1759b745eb4334145fcbeec4de9a218~z_rV3hZM-0826408264eucas1p2Z;
 Mon, 10 Apr 2017 08:05:58 +0000 (GMT)
X-AuditID: cbfec7f4-f79806d000001279-dd-58eb3ce78ee0
Received: from eusync2.samsung.com ( [203.254.199.212])
 by	eusmgms2.samsung.com (EUCPMTA) with SMTP id 4F.83.20206.CFC3BE85; Mon,
 10	Apr 2017 09:06:20 +0100 (BST)
Received: from [106.109.129.180] by eusync2.samsung.com
 (Oracle	Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014))
 with	ESMTPA id <0OO600CGJP5XIL90@eusync2.samsung.com>; Mon,
 10 Apr 2017 09:05:58	+0100 (BST)
To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>,
 Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: dev@dpdk.org, David Marchand <david.marchand@6wind.com>,
 Heetae Ahn <heetae82.ahn@samsung.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Jianfeng Tan <jianfeng.tan@intel.com>, Neil Horman <nhorman@tuxdriver.com>,
 Yulong Pei <yulong.pei@intel.com>, stable@dpdk.org,
 Bruce Richardson <bruce.richardson@intel.com>
From: Ilya Maximets <i.maximets@samsung.com>
Message-id: <b4d7e98b-773e-9927-ce5c-b3807b9a4b94@samsung.com>
Date: Mon, 10 Apr 2017 11:05:56 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.0
MIME-version: 1.0
In-reply-to: <b9291962-ceb3-06e3-445d-5089fb0868c0@intel.com>
Content-type: text/plain; charset=windows-1252
Content-transfer-encoding: 7bit
X-Brightmail-Tracker: H4sIAAAAAAAAA01Se0hTcRjtd+/d3XW4uE2tDy2MQf+IqZF/XMw0I2RYoiXU6J8c7qLis3tV
 NAsMqfl2qaTMZ8jKNiMzm6YGYmPig9p8ZyaiUKklpOKjRHPeCf53znfOx8c5fBQuaxG5U/HJ
 aSyXrEqUkxLCZNn6dPZH4JLSz6ahmUlDMNOUpyGY5ZV2jHm6+lXMjGq2xExh9RrJTOX2k0yT
 1oyYnbxtMbPWWkkyE1qriFmf3cAvOStspdNI8bdBL1I0di9girr+64qSNgNSjG4YUSR5WxKo
 ZhPjM1jONyhaErfZrEepC/6Z/wzXcpDRqwA5UUD7g/lJi1jAx8E685osQBJKRusR7NZ0iQWy
 iqBYZyIPNirLqkSC8BxBfek3B/mJ4M1qOWZ3udABYBos2ceudCJsTy1jdhNOmzGo6+wh7AJJ
 e8OA0YzsWEoHwbvneftzgj4Dc1s23I7daCV0TBaRgucYbJbP7Huc6IswWavbn+O0H1RVfMQE
 7Alvm3/j9mNAD4vB9id/LwS1R05Baw8uRLgCO9rvhIBdYLGvzVHASRgpLySE3VwEOYZhJBAt
 grXcF5jgCoaBL2OOa0ehzFSJCwekkPdYJlgU0Nuid/QVAvOfix1FDmGgH59HWuSpOxRIdyiE
 7lCIBoQbkCubzifFsry/D69K4tOTY31iUpJa0d4fDe70rXWgRktAL6IpJHeWRmYvKmUiVQaf
 ldSLgMLlrtJJjyWlTKpWZd1juZQ7XHoiy/ciD4qQn5BKBsaVMjpWlcYmsGwqyx2oGOXknoPK
 b0iaIPtWVEJI33D4r6hu79UHNUbbzYmSEYI/kmpeEF3WdVarVj6MXbjrVj80Ye1+5Xs+hjPN
 Z+nm212IudmZXDIsTF1r9YqWmHfD8okRS9r7iL5HD0Mj7l+dPs1VaF46sUvrFe4hFmtptVbU
 FPqsKy26SB3ErdRDZviyRk7wcapzXjjHq/4DIYDCXEMDAAA=
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsVy+t/xK7p/bF5HGDy9wWpxY5W9xYqOdhaL
 d5+2M1lM+3yb3eJK+092i+7ZX9gsbjWfZLNYMeEIo8W/jj/sFl82TWezuD7hAqvFtwffmR14
 PC7232H0+LVgKavH4j0vmTzmnQz06NuyitHjyvfVjAFsUW42GamJKalFCql5yfkpmXnptkqh
 IW66FkoKeYm5qbZKEbq+IUFKCmWJOaVAnpEBGnBwDnAPVtK3S3DL+LFmKWPBS5OK36t8GhhX
 a3UxcnJICJhITJ80gxXCFpO4cG89G4gtJLCEUWL5fNkuRi4g+wWjxM5TP8GKhAWsJLad7mMC
 sUUEciRmbNrLCFF0hkmi5f08VhCHWeAYk8Ssh/fZQarYBHQkTq0+wghi8wrYSWxd1sECYrMI
 qEo8+nmRGcQWFYiQmP90FRNEjaDEj8n3wGo4BWwlbsydBXQSB9BQPYn7F8GuZhaQl9i85i3z
 BEaBWUg6ZiFUzUJStYCReRWjSGppcW56brGRXnFibnFpXrpecn7uJkZgTG479nPLDsaud8GH
 GAU4GJV4eH9UvIoQYk0sK67MPcQowcGsJMJ7Q/p1hBBvSmJlVWpRfnxRaU5q8SFGU6AXJjJL
 iSbnA9NFXkm8oYmhuaWhkbGFhbmRkZI479QPV8KFBNITS1KzU1MLUotg+pg4OKWA4f8lZoHJ
 naNfpj5ccnDBgd5Ju40PWSSzbeM8GWgqPHGp14fH8ZFWT3ruMf3mCYpQEYw8Hfnpa9K317se
 +cx8rXf40NMd277kXv8o/rv5+4YNwnbOpeeYpLj26S06J8LyfZbeZn0BC4fwGslV+kXzHas4
 OQ+U8MwpCU1JeKeSz5D0YNl91vkhB5VYijMSDbWYi4oTATb3kuPfAgAA
X-MTR: 20000000000000000@CPGS
X-CMS-MailID: 20170410080558eucas1p2d1759b745eb4334145fcbeec4de9a218
X-Msg-Generator: CA
X-Sender-IP: 182.198.249.180
X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?=
X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?=
 =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?=
X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?=
CMS-TYPE: 201P
X-HopCount: 7
X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a
References: <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
 <d90f300a-dc8c-8788-d3ef-7970a6d36508@samsung.com>
 <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> <1945759.SoJb5dzy87@xps13>
 <b7ac2887-1cc4-aae7-8337-98a0f6c548a8@samsung.com>
 <b9291962-ceb3-06e3-445d-5089fb0868c0@intel.com>
Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Apr 2017 08:06:01 -0000



On 10.04.2017 10:51, Sergio Gonzalez Monroy wrote:
> On 10/04/2017 08:11, Ilya Maximets wrote:
>> On 07.04.2017 18:44, Thomas Monjalon wrote:
>>> 2017-04-07 18:14, Ilya Maximets:
>>>> Hi All.
>>>>
>>>> I wanted to ask (just to clarify current status):
>>>> Will this patch be included in current release (acked by maintainer)
>>>> and then I will upgrade it to hybrid logic or I will just prepare v3
>>>> with hybrid logic for 17.08 ?
>>> What is your preferred option Ilya?
>> I have no strong opinion on this. One thought is that it could be
>> nice if someone else could test this functionality with current
>> release before enabling it by default in 17.08.
>>
>> Tomorrow I'm going on vacation. So I'll post rebased version today
>> (there are few fuzzes with current master) and you with Sergio may
>> decide what to do.
>>
>> Best regards, Ilya Maximets.
>>
>>> Sergio?
> 
> I would be inclined towards v3 targeting v17.08. IMHO it would be more clean this way.

OK.
I've sent rebased version just in case.

> 
> Sergio
> 
>>>
>>>> On 27.03.2017 17:43, Ilya Maximets wrote:
>>>>> On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote:
>>>>>> On 09/03/2017 12:57, Ilya Maximets wrote:
>>>>>>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote:
>>>>>>>> Hi Ilya,
>>>>>>>>
>>>>>>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected.
>>>>>>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point.
>>>>>>>>
>>>>>>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable.
>>>>>>>>
>>>>>>>> Currently at a high level regarding hugepages per numa node:
>>>>>>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point.
>>>>>>>> 2) Find out numa node of each hugepage.
>>>>>>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node.
>>>>>>>>
>>>>>>>> Using libnuma we could try to allocate hugepages per numa:
>>>>>>>> 1) Try to map as many hugepages from numa 0.
>>>>>>>> 2) Check if we have enough hugepages for requested memory in numa 0.
>>>>>>>> 3) Try to map as many hugepages from numa 1.
>>>>>>>> 4) Check if we have enough hugepages for requested memory in numa 1.
>>>>>>>>
>>>>>>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg).
>>>>>>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory.
>>>>>>>>
>>>>>>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>> Hi Sergio,
>>>>>>>
>>>>>>> Thanks for your attention to this.
>>>>>>>
>>>>>>> For now, as we have some issues with non-contiguous
>>>>>>> hugepages, I'm thinking about following hybrid schema:
>>>>>>> 1) Allocate essential hugepages:
>>>>>>>      1.1) Allocate as many hugepages from numa N to
>>>>>>>           only fit requested memory for this numa.
>>>>>>>      1.2) repeat 1.1 for all numa nodes.
>>>>>>> 2) Try to map all remaining free hugepages in a round-robin
>>>>>>>      fashion like in this patch.
>>>>>>> 3) Sort pages and choose the most suitable.
>>>>>>>
>>>>>>> This solution should decrease number of issues connected with
>>>>>>> non-contiguous memory.
>>>>>> Sorry for late reply, I was hoping for more comments from the community.
>>>>>>
>>>>>> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency.
>>>>>> I think your proposal is good, could you consider implementing such approach on next release?
>>>>> Sure, I can implement this for 17.08 release.
>>>>>
>>>>>>>> On 06/03/2017 09:34, Ilya Maximets wrote:
>>>>>>>>> Hi all.
>>>>>>>>>
>>>>>>>>> So, what about this change?
>>>>>>>>>
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>
>>>>>>>>> On 16.02.2017 16:01, Ilya Maximets wrote:
>>>>>>>>>> Currently EAL allocates hugepages one by one not paying
>>>>>>>>>> attention from which NUMA node allocation was done.
>>>>>>>>>>
>>>>>>>>>> Such behaviour leads to allocation failure if number of
>>>>>>>>>> available hugepages for application limited by cgroups
>>>>>>>>>> or hugetlbfs and memory requested not only from the first
>>>>>>>>>> socket.
>>>>>>>>>>
>>>>>>>>>> Example:
>>>>>>>>>>       # 90 x 1GB hugepages availavle in a system
>>>>>>>>>>
>>>>>>>>>>       cgcreate -g hugetlb:/test
>>>>>>>>>>       # Limit to 32GB of hugepages
>>>>>>>>>>       cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
>>>>>>>>>>       # Request 4GB from each of 2 sockets
>>>>>>>>>>       cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>>>>>>>>>>
>>>>>>>>>>       EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
>>>>>>>>>>       EAL: 32 not 90 hugepages of size 1024 MB allocated
>>>>>>>>>>       EAL: Not enough memory available on socket 1!
>>>>>>>>>>            Requested: 4096MB, available: 0MB
>>>>>>>>>>       PANIC in rte_eal_init():
>>>>>>>>>>       Cannot init memory
>>>>>>>>>>
>>>>>>>>>>       This happens beacause all allocated pages are
>>>>>>>>>>       on socket 0.
>>>>>>>>>>
>>>>>>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each
>>>>>>>>>> hugepage to one of requested nodes in a round-robin fashion.
>>>>>>>>>> In this case all allocated pages will be fairly distributed
>>>>>>>>>> between all requested nodes.
>>>>>>>>>>
>>>>>>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
>>>>>>>>>> introduced and disabled by default because of external
>>>>>>>>>> dependency from libnuma.
>>>>>>>>>>
>>>>>>>>>> Cc:<stable@dpdk.org>
>>>>>>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets<i.maximets@samsung.com>
>>>>>>>>>> ---
>>>>>>>>>>     config/common_base                       |  1 +
>>>>>>>>>>     lib/librte_eal/Makefile                  |  4 ++
>>>>>>>>>>     lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++
>>>>>>>>>>     mk/rte.app.mk                            |  3 ++
>>>>>>>>>>     4 files changed, 74 insertions(+)
>>>>>> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
>>>>> Thanks.
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>
>>>
>>>
>>>
> 
> 
> 
>