From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 1DC3C2BBE; Mon, 10 Apr 2017 09:11:44 +0200 (CEST) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OO6009LWMNJKT40@mailout1.w1.samsung.com>; Mon, 10 Apr 2017 08:11:43 +0100 (BST) Received: from eusmges3.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170410071142eucas1p1252bb0c8340fa5c86d235e0ffec2f57c~z979DLxoo1469614696eucas1p1B; Mon, 10 Apr 2017 07:11:42 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges3.samsung.com (EUCPMTA) with SMTP id 33.CA.17464.E203BE85; Mon, 10 Apr 2017 08:11:42 +0100 (BST) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170410071141eucas1p203b96632f235a68f8004ce27e94aa87e~z978a45dm3114431144eucas1p2p; Mon, 10 Apr 2017 07:11:41 +0000 (GMT) X-AuditID: cbfec7f2-f797e6d000004438-5e-58eb302e0498 Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id AD.38.17452.7A03BE85; Mon, 10 Apr 2017 08:13:43 +0100 (BST) Received: from [106.109.129.180] by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OO600C1BMNGSX40@eusync4.samsung.com>; Mon, 10 Apr 2017 08:11:41 +0100 (BST) To: Thomas Monjalon , Sergio Gonzalez Monroy Cc: dev@dpdk.org, David Marchand , Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Bruce Richardson From: Ilya Maximets Message-id: Date: Mon, 10 Apr 2017 10:11:39 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-version: 1.0 In-reply-to: <1945759.SoJb5dzy87@xps13> Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA02Sa0hTYRjHeXe2s+NwcZpaD1oIC0vEvNDtoDEVigZC2pc4BWHLHVS8sqOS GiGJuW4qmhmTxDLTtkCZ5g2LOe9a3lhuTbGyCzpTwlmmpOZ2FPz2e9739/A+/4eXwCQNAk8i ISWdUaUokqS4iN/cuzp8NCBong6yzBOURRtG1akL+NTiUguPemSfFFKmglUhda9iGaeseQM4 VVfcjagN9T8htawvxylz8aiA+vN5BQt3lY8VTSH5WlWNQF7dMceTVw5ckBc2aZHctKJD0fhl 0Wklk5SQyagCZVdF8ZqJBX7aZND1Z30WLBfZDt9FLgSQx+H72zqc430wOl2/xSJCQtYg0C9P Y1xhR2Cb1eE7HRUjHdvWCwR91f1CrphFMGpo4zksNzIEmocKt5gg3Mkk2PwZ4XAwspsHle0G vsPBSX8Y1HUjB4tJGZi6xpzMJ31gZszuZA+ShlbLfZxz9sLf0mlnrwt5BOZqrc5zjAyCxw+7 eBx7Q+OrBefYQI4LwWY2CR1DAHkQ9AaMS3AGFipmBBy7ga2vScjxAbij7uRxvXkIcrXjiCuK ESzn1fI4KwwGP37Yfm0PlDSXY9wDYlDflnCKHIwNNdvrioCvIw+2N/QNQc/zArwYeWt2BdLs CqHZFaIKYVrkzmSwyXEMeyyAVSSzGSlxAbGpyXq09Y+GNvqWWtHv/hAjIgkkdRVH59hoiUCR yWYlGxEQmNRdbPGapyVipSIrm1GlxqgykhjWiLwIvnS/WDQ4QUvIOEU6k8gwaYxq55ZHuHjm IuV75cks6lzpRVHBl2v5N9/pO9kB2j/GIzK8cH1RotSY190MtLVt7kRv6C/9cGOUKJBm8Kfp t5gnuO8NnW/iZs9qqDUtMicbO3/lh99Cidk8MyVryQ92EVY2vhmvyvxkdbU0+Lx+WWaX1Mja 0w97ZqxFyWNP0WVnB+svFc0fkvLZeEWwH6ZiFf8B/z2+F0MDAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrHIsWRmVeSWpSXmKPExsVy+t/xa7rLDV5HGKw9oGNxY5W9xYqOdhaL d5+2M1lM+3yb3eJK+092i+7ZX9gsbjWfZLNYMeEIo8W/jj/sFl82TWezuD7hAqvFtwffmR14 PC7232H0+LVgKavH4j0vmTzmnQz06NuyitHjyvfVjAFsUW42GamJKalFCql5yfkpmXnptkqh IW66FkoKeYm5qbZKEbq+IUFKCmWJOaVAnpEBGnBwDnAPVtK3S3DLmHXtLUvBbYOKRcdvMDcw vlLrYuTkkBAwkZh9fg8bhC0mceHeeiCbi0NIYAmjxMXGPnYI5wWjxOl3pxhBqoQFrCS2ne5j ArFFBHIkDh54zwhR9IRR4uuzeywgDrPAMSaJWQ/vs4NUsQnoSJxafQSsm1fATuLK4YtgNouA qsSji5/BbFGBCIn5T1cxQdQISvyYDDKIk4NTQF3i5fJbQDdxAA3Vk7h/UQskzCwgL7F5zVvm CYwCs5B0zEKomoWkagEj8ypGkdTS4tz03GJDveLE3OLSvHS95PzcTYzAuNx27OfmHYyXNgYf YhTgYFTi4Q2ofhUhxJpYVlyZe4hRgoNZSYT3hvTrCCHelMTKqtSi/Pii0pzU4kOMpkAvTGSW Ek3OB6aMvJJ4QxNDc0tDI2MLC3MjIyVx3pIPV8KFBNITS1KzU1MLUotg+pg4OKUaGO0T/ZgV z6g53k489d/orsGbGI7N1qJe9lWbhE1sW62kmQ+tqtO0Eroxuyi3t17l7eLzQakv3XL7Nt5+ vfuC7tPCS9csvU42/TJe1LjirfPtxDOdNZeDChydf1+VP/jQULaknzW6tm1jZ9xO5TuXHl+o ro61iOjt49hweq7f/7wrUfO+ONbtVmIpzkg01GIuKk4EAJDq1FXhAgAA X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170410071141eucas1p203b96632f235a68f8004ce27e94aa87e X-Msg-Generator: CA X-Sender-IP: 182.198.249.179 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a References: <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> <1945759.SoJb5dzy87@xps13> Subject: Re: [dpdk-dev] [PATCH] mem: balanced allocation of hugepages X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 07:11:45 -0000 On 07.04.2017 18:44, Thomas Monjalon wrote: > 2017-04-07 18:14, Ilya Maximets: >> Hi All. >> >> I wanted to ask (just to clarify current status): >> Will this patch be included in current release (acked by maintainer) >> and then I will upgrade it to hybrid logic or I will just prepare v3 >> with hybrid logic for 17.08 ? > > What is your preferred option Ilya? I have no strong opinion on this. One thought is that it could be nice if someone else could test this functionality with current release before enabling it by default in 17.08. Tomorrow I'm going on vacation. So I'll post rebased version today (there are few fuzzes with current master) and you with Sergio may decide what to do. Best regards, Ilya Maximets. > Sergio? > > >> On 27.03.2017 17:43, Ilya Maximets wrote: >>> On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: >>>> On 09/03/2017 12:57, Ilya Maximets wrote: >>>>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>>>>> Hi Ilya, >>>>>> >>>>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. >>>>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. >>>>>> >>>>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. >>>>>> >>>>>> Currently at a high level regarding hugepages per numa node: >>>>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. >>>>>> 2) Find out numa node of each hugepage. >>>>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node. >>>>>> >>>>>> Using libnuma we could try to allocate hugepages per numa: >>>>>> 1) Try to map as many hugepages from numa 0. >>>>>> 2) Check if we have enough hugepages for requested memory in numa 0. >>>>>> 3) Try to map as many hugepages from numa 1. >>>>>> 4) Check if we have enough hugepages for requested memory in numa 1. >>>>>> >>>>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). >>>>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. >>>>>> >>>>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. >>>>>> >>>>>> Thoughts? >>>>> Hi Sergio, >>>>> >>>>> Thanks for your attention to this. >>>>> >>>>> For now, as we have some issues with non-contiguous >>>>> hugepages, I'm thinking about following hybrid schema: >>>>> 1) Allocate essential hugepages: >>>>> 1.1) Allocate as many hugepages from numa N to >>>>> only fit requested memory for this numa. >>>>> 1.2) repeat 1.1 for all numa nodes. >>>>> 2) Try to map all remaining free hugepages in a round-robin >>>>> fashion like in this patch. >>>>> 3) Sort pages and choose the most suitable. >>>>> >>>>> This solution should decrease number of issues connected with >>>>> non-contiguous memory. >>>> >>>> Sorry for late reply, I was hoping for more comments from the community. >>>> >>>> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency. >>>> I think your proposal is good, could you consider implementing such approach on next release? >>> >>> Sure, I can implement this for 17.08 release. >>> >>>>> >>>>>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>>>>> Hi all. >>>>>>> >>>>>>> So, what about this change? >>>>>>> >>>>>>> Best regards, Ilya Maximets. >>>>>>> >>>>>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>>>>> Currently EAL allocates hugepages one by one not paying >>>>>>>> attention from which NUMA node allocation was done. >>>>>>>> >>>>>>>> Such behaviour leads to allocation failure if number of >>>>>>>> available hugepages for application limited by cgroups >>>>>>>> or hugetlbfs and memory requested not only from the first >>>>>>>> socket. >>>>>>>> >>>>>>>> Example: >>>>>>>> # 90 x 1GB hugepages availavle in a system >>>>>>>> >>>>>>>> cgcreate -g hugetlb:/test >>>>>>>> # Limit to 32GB of hugepages >>>>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>>>>> # Request 4GB from each of 2 sockets >>>>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>>>>> >>>>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>>>>> EAL: Not enough memory available on socket 1! >>>>>>>> Requested: 4096MB, available: 0MB >>>>>>>> PANIC in rte_eal_init(): >>>>>>>> Cannot init memory >>>>>>>> >>>>>>>> This happens beacause all allocated pages are >>>>>>>> on socket 0. >>>>>>>> >>>>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>>>>> In this case all allocated pages will be fairly distributed >>>>>>>> between all requested nodes. >>>>>>>> >>>>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>>>>> introduced and disabled by default because of external >>>>>>>> dependency from libnuma. >>>>>>>> >>>>>>>> Cc: >>>>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>>>>> >>>>>>>> Signed-off-by: Ilya Maximets >>>>>>>> --- >>>>>>>> config/common_base | 1 + >>>>>>>> lib/librte_eal/Makefile | 4 ++ >>>>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ >>>>>>>> mk/rte.app.mk | 3 ++ >>>>>>>> 4 files changed, 74 insertions(+) >>>> >>>> Acked-by: Sergio Gonzalez Monroy >>> >>> Thanks. >>> >>> Best regards, Ilya Maximets. >>> > > > > >