From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout3.w1.samsung.com (mailout3.w1.samsung.com [210.118.77.13]) by dpdk.org (Postfix) with ESMTP id 7C30529C7; Mon, 10 Apr 2017 10:06:01 +0200 (CEST) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OO600ABEP5Z7X60@mailout3.w1.samsung.com>; Mon, 10 Apr 2017 09:05:59 +0100 (BST) Received: from eusmges4.samsung.com (unknown [203.254.199.244]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170410080559eucas1p20beef154ae10f80212b0658482e02659~z_rWdyshW0671106711eucas1p22; Mon, 10 Apr 2017 08:05:59 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges4.samsung.com (EUCPMTA) with SMTP id 79.29.04729.7EC3BE85; Mon, 10 Apr 2017 09:05:59 +0100 (BST) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20170410080558eucas1p2d1759b745eb4334145fcbeec4de9a218~z_rV3hZM-0826408264eucas1p2Z; Mon, 10 Apr 2017 08:05:58 +0000 (GMT) X-AuditID: cbfec7f4-f79806d000001279-dd-58eb3ce78ee0 Received: from eusync2.samsung.com ( [203.254.199.212]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id 4F.83.20206.CFC3BE85; Mon, 10 Apr 2017 09:06:20 +0100 (BST) Received: from [106.109.129.180] by eusync2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OO600CGJP5XIL90@eusync2.samsung.com>; Mon, 10 Apr 2017 09:05:58 +0100 (BST) To: Sergio Gonzalez Monroy , Thomas Monjalon Cc: dev@dpdk.org, David Marchand , Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Bruce Richardson From: Ilya Maximets Message-id: Date: Mon, 10 Apr 2017 11:05:56 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-version: 1.0 In-reply-to: Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA01Se0hTcRjtd+/d3XW4uE2tDy2MQf+IqZF/XMw0I2RYoiXU6J8c7qLis3tV NAsMqfl2qaTMZ8jKNiMzm6YGYmPig9p8ZyaiUKklpOKjRHPeCf53znfOx8c5fBQuaxG5U/HJ aSyXrEqUkxLCZNn6dPZH4JLSz6ahmUlDMNOUpyGY5ZV2jHm6+lXMjGq2xExh9RrJTOX2k0yT 1oyYnbxtMbPWWkkyE1qriFmf3cAvOStspdNI8bdBL1I0di9girr+64qSNgNSjG4YUSR5WxKo ZhPjM1jONyhaErfZrEepC/6Z/wzXcpDRqwA5UUD7g/lJi1jAx8E685osQBJKRusR7NZ0iQWy iqBYZyIPNirLqkSC8BxBfek3B/mJ4M1qOWZ3udABYBos2ceudCJsTy1jdhNOmzGo6+wh7AJJ e8OA0YzsWEoHwbvneftzgj4Dc1s23I7daCV0TBaRgucYbJbP7Huc6IswWavbn+O0H1RVfMQE 7Alvm3/j9mNAD4vB9id/LwS1R05Baw8uRLgCO9rvhIBdYLGvzVHASRgpLySE3VwEOYZhJBAt grXcF5jgCoaBL2OOa0ehzFSJCwekkPdYJlgU0Nuid/QVAvOfix1FDmGgH59HWuSpOxRIdyiE 7lCIBoQbkCubzifFsry/D69K4tOTY31iUpJa0d4fDe70rXWgRktAL6IpJHeWRmYvKmUiVQaf ldSLgMLlrtJJjyWlTKpWZd1juZQ7XHoiy/ciD4qQn5BKBsaVMjpWlcYmsGwqyx2oGOXknoPK b0iaIPtWVEJI33D4r6hu79UHNUbbzYmSEYI/kmpeEF3WdVarVj6MXbjrVj80Ye1+5Xs+hjPN Z+nm212IudmZXDIsTF1r9YqWmHfD8okRS9r7iL5HD0Mj7l+dPs1VaF46sUvrFe4hFmtptVbU FPqsKy26SB3ErdRDZviyRk7wcapzXjjHq/4DIYDCXEMDAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsVy+t/xK7p/bF5HGDy9wWpxY5W9xYqOdhaL d5+2M1lM+3yb3eJK+092i+7ZX9gsbjWfZLNYMeEIo8W/jj/sFl82TWezuD7hAqvFtwffmR14 PC7232H0+LVgKavH4j0vmTzmnQz06NuyitHjyvfVjAFsUW42GamJKalFCql5yfkpmXnptkqh IW66FkoKeYm5qbZKEbq+IUFKCmWJOaVAnpEBGnBwDnAPVtK3S3DL+LFmKWPBS5OK36t8GhhX a3UxcnJICJhITJ80gxXCFpO4cG89G4gtJLCEUWL5fNkuRi4g+wWjxM5TP8GKhAWsJLad7mMC sUUEciRmbNrLCFF0hkmi5f08VhCHWeAYk8Ssh/fZQarYBHQkTq0+wghi8wrYSWxd1sECYrMI qEo8+nmRGcQWFYiQmP90FRNEjaDEj8n3wGo4BWwlbsydBXQSB9BQPYn7F8GuZhaQl9i85i3z BEaBWUg6ZiFUzUJStYCReRWjSGppcW56brGRXnFibnFpXrpecn7uJkZgTG479nPLDsaud8GH GAU4GJV4eH9UvIoQYk0sK67MPcQowcGsJMJ7Q/p1hBBvSmJlVWpRfnxRaU5q8SFGU6AXJjJL iSbnA9NFXkm8oYmhuaWhkbGFhbmRkZI479QPV8KFBNITS1KzU1MLUotg+pg4OKWA4f8lZoHJ naNfpj5ccnDBgd5Ju40PWSSzbeM8GWgqPHGp14fH8ZFWT3ruMf3mCYpQEYw8Hfnpa9K317se +cx8rXf40NMd277kXv8o/rv5+4YNwnbOpeeYpLj26S06J8LyfZbeZn0BC4fwGslV+kXzHas4 OQ+U8MwpCU1JeKeSz5D0YNl91vkhB5VYijMSDbWYi4oTATb3kuPfAgAA X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170410080558eucas1p2d1759b745eb4334145fcbeec4de9a218 X-Msg-Generator: CA X-Sender-IP: 182.198.249.180 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a References: <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> <1945759.SoJb5dzy87@xps13> Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 08:06:01 -0000 On 10.04.2017 10:51, Sergio Gonzalez Monroy wrote: > On 10/04/2017 08:11, Ilya Maximets wrote: >> On 07.04.2017 18:44, Thomas Monjalon wrote: >>> 2017-04-07 18:14, Ilya Maximets: >>>> Hi All. >>>> >>>> I wanted to ask (just to clarify current status): >>>> Will this patch be included in current release (acked by maintainer) >>>> and then I will upgrade it to hybrid logic or I will just prepare v3 >>>> with hybrid logic for 17.08 ? >>> What is your preferred option Ilya? >> I have no strong opinion on this. One thought is that it could be >> nice if someone else could test this functionality with current >> release before enabling it by default in 17.08. >> >> Tomorrow I'm going on vacation. So I'll post rebased version today >> (there are few fuzzes with current master) and you with Sergio may >> decide what to do. >> >> Best regards, Ilya Maximets. >> >>> Sergio? > > I would be inclined towards v3 targeting v17.08. IMHO it would be more clean this way. OK. I've sent rebased version just in case. > > Sergio > >>> >>>> On 27.03.2017 17:43, Ilya Maximets wrote: >>>>> On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: >>>>>> On 09/03/2017 12:57, Ilya Maximets wrote: >>>>>>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>>>>>>> Hi Ilya, >>>>>>>> >>>>>>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. >>>>>>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. >>>>>>>> >>>>>>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. >>>>>>>> >>>>>>>> Currently at a high level regarding hugepages per numa node: >>>>>>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. >>>>>>>> 2) Find out numa node of each hugepage. >>>>>>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node. >>>>>>>> >>>>>>>> Using libnuma we could try to allocate hugepages per numa: >>>>>>>> 1) Try to map as many hugepages from numa 0. >>>>>>>> 2) Check if we have enough hugepages for requested memory in numa 0. >>>>>>>> 3) Try to map as many hugepages from numa 1. >>>>>>>> 4) Check if we have enough hugepages for requested memory in numa 1. >>>>>>>> >>>>>>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). >>>>>>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. >>>>>>>> >>>>>>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. >>>>>>>> >>>>>>>> Thoughts? >>>>>>> Hi Sergio, >>>>>>> >>>>>>> Thanks for your attention to this. >>>>>>> >>>>>>> For now, as we have some issues with non-contiguous >>>>>>> hugepages, I'm thinking about following hybrid schema: >>>>>>> 1) Allocate essential hugepages: >>>>>>> 1.1) Allocate as many hugepages from numa N to >>>>>>> only fit requested memory for this numa. >>>>>>> 1.2) repeat 1.1 for all numa nodes. >>>>>>> 2) Try to map all remaining free hugepages in a round-robin >>>>>>> fashion like in this patch. >>>>>>> 3) Sort pages and choose the most suitable. >>>>>>> >>>>>>> This solution should decrease number of issues connected with >>>>>>> non-contiguous memory. >>>>>> Sorry for late reply, I was hoping for more comments from the community. >>>>>> >>>>>> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency. >>>>>> I think your proposal is good, could you consider implementing such approach on next release? >>>>> Sure, I can implement this for 17.08 release. >>>>> >>>>>>>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>>>>>>> Hi all. >>>>>>>>> >>>>>>>>> So, what about this change? >>>>>>>>> >>>>>>>>> Best regards, Ilya Maximets. >>>>>>>>> >>>>>>>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>>>>>>> Currently EAL allocates hugepages one by one not paying >>>>>>>>>> attention from which NUMA node allocation was done. >>>>>>>>>> >>>>>>>>>> Such behaviour leads to allocation failure if number of >>>>>>>>>> available hugepages for application limited by cgroups >>>>>>>>>> or hugetlbfs and memory requested not only from the first >>>>>>>>>> socket. >>>>>>>>>> >>>>>>>>>> Example: >>>>>>>>>> # 90 x 1GB hugepages availavle in a system >>>>>>>>>> >>>>>>>>>> cgcreate -g hugetlb:/test >>>>>>>>>> # Limit to 32GB of hugepages >>>>>>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>>>>>>> # Request 4GB from each of 2 sockets >>>>>>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>>>>>>> >>>>>>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>>>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>>>>>>> EAL: Not enough memory available on socket 1! >>>>>>>>>> Requested: 4096MB, available: 0MB >>>>>>>>>> PANIC in rte_eal_init(): >>>>>>>>>> Cannot init memory >>>>>>>>>> >>>>>>>>>> This happens beacause all allocated pages are >>>>>>>>>> on socket 0. >>>>>>>>>> >>>>>>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>>>>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>>>>>>> In this case all allocated pages will be fairly distributed >>>>>>>>>> between all requested nodes. >>>>>>>>>> >>>>>>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>>>>>>> introduced and disabled by default because of external >>>>>>>>>> dependency from libnuma. >>>>>>>>>> >>>>>>>>>> Cc: >>>>>>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ilya Maximets >>>>>>>>>> --- >>>>>>>>>> config/common_base | 1 + >>>>>>>>>> lib/librte_eal/Makefile | 4 ++ >>>>>>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ >>>>>>>>>> mk/rte.app.mk | 3 ++ >>>>>>>>>> 4 files changed, 74 insertions(+) >>>>>> Acked-by: Sergio Gonzalez Monroy >>>>> Thanks. >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>> >>> >>> >>> > > > >