From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 8222CFB0A; Mon, 27 Mar 2017 16:43:20 +0200 (CEST) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0ONH008AFA86V810@mailout1.w1.samsung.com>; Mon, 27 Mar 2017 15:43:18 +0100 (BST) Received: from eusmges5.samsung.com (unknown [203.254.199.245]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170327144318eucas1p103855acb633a42a892bb0ae6a4d4c627~vxEQIpgLW2998229982eucas1p1d; Mon, 27 Mar 2017 14:43:18 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges5.samsung.com (EUCPMTA) with SMTP id 48.96.25577.50529D85; Mon, 27 Mar 2017 15:43:17 +0100 (BST) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170327144317eucas1p128395d538d2f284cdd1e71f83823c965~vxEPZxRzv1631416314eucas1p1T; Mon, 27 Mar 2017 14:43:17 +0000 (GMT) X-AuditID: cbfec7f5-f792f6d0000063e9-86-58d925050a0c Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id E0.9D.17452.47529D85; Mon, 27 Mar 2017 15:45:08 +0100 (BST) Received: from [106.109.129.180] by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0ONH007J1A840Q70@eusync4.samsung.com>; Mon, 27 Mar 2017 15:43:17 +0100 (BST) To: Sergio Gonzalez Monroy , dev@dpdk.org, David Marchand Cc: Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Thomas Monjalon , Bruce Richardson From: Ilya Maximets Message-id: Date: Mon, 27 Mar 2017 17:43:15 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-version: 1.0 In-reply-to: <077682cf-8534-7890-9453-7c9e822bd3e6@intel.com> Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA02SbUhTYRTHfe7d7u6Wi9u0OlghDUwoMwXNq5SaKOxDgdIHpS868qKWU9lV 0YiQFLesVLbQmPhSspQtTWxMRQXf0pxlZr5NRYWCzFKqTW0l0uad4Lff/5zznHP+h4fEJW18 HzIjK5dRZskzpYSIZx52jJ/n+1mTghxzF+k5QxTdrFbx6I3fHRhdZVsQ0FMqh4B+WGMn6Pni UYJurhxC9K56R0Db26sJerZygk9vrWzj0Z6yjxWLSPa3Qc+XNfZ8w2R1owmycpMByaa2jSie uCG6lMpkZuQzyguRKaL06qYePGfQv+DNdhVehDS+ZUhIAhUCO+v9GMfHYGLpFVGGRKSE0iP4 Yfwk4IQNwRddP2//RYVtl8clXiCob57EOLGK4GfLs71eXlQEmMfK99ibKoDaIQ3hYpzqxGDR JHUxQQWAxTiEXCymImHweZPAxTzKD/5tft2LH6WSoHPuEcHVHIE/2qW9LYTUZWho3cC5nkHw 9MkgxrEvvH65jrsWAmpcAN29KqcgneIUtPfhnINY6J1ZcrMXrI2YBByfhAfqfox7W4ygyDCJ OFGJwF7c5L5SFFis0+5ph0FjrnYPEIO6VMKVyGCgTU9wfAU+f3jsvuMaBu/qSlEl8tUdMKQ7 YEJ3wEQDwg3Im8ljFWkMGxrIyhVsXlZa4M1sRTty/qOx3ZHNTqQfjhhAFImknuIJT2uShC/P ZwsVAwhIXOot3jrkDIlT5YV3GGV2sjIvk2EH0AmSJz0uFllmkiRUmjyXuc0wOYxyP4uRQp8i 5HGrSttiSrfcDzG9Dx8Oa4mxlizHBUdGz1foA3e6G6+HKox3+2IWtCVXE+vjp691dZuDdfnq uC5NmzA85Z6tMMz/NBI2Zi2fM2/g2pVmy9vkDPS9plaVUlO4UjDTYY8VJ4fmROoSEhVn8NaZ Lsev+bIQlhe1kGKTzAasemRKeWy6PPgsrmTl/wH0u4igQwMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsVy+t/xa7olqjcjDGa2mlrcWGVvsaKjncXi 3aftTBbTPt9mt7jS/pPdonv2FzaLW80n2SxWTDjCaPGv4w+7xZdN09ksrk+4wGrx7cF3Zgce j4v9dxg9fi1YyuqxeM9LJo95JwM9+rasYvS48n01YwBblJtNRmpiSmqRQmpecn5KZl66rVJo iJuuhZJCXmJuqq1ShK5vSJCSQlliTimQZ2SABhycA9yDlfTtEtwypi/fw1xwWL3i6PdpzA2M k+S7GDk5JARMJPo//2OBsMUkLtxbzwZiCwksYZRYf628i5ELyH7BKPHk6gmwImEBK4ltp/uY QGwRgQqJ/7e3sUEUvWGS+DRzO1iCWWAXk8SWqykgNpuAjsSp1UcYQWxeATuJw4uWs4PYLAKq Er+/PgeLiwpESMx/uooJokZQ4sfke2DLOAVsJRase8fcxcgBNFNP4v5FLYjx8hKb17xlnsAo MAtJxyyEqllIqhYwMq9iFEktLc5Nzy021CtOzC0uzUvXS87P3cQIjMltx35u3sF4aWPwIUYB DkYlHl4N/psRQqyJZcWVuYcYJTiYlUR4v3EDhXhTEiurUovy44tKc1KLDzGaAr0wkVlKNDkf mC7ySuINTQzNLQ2NjC0szI2MlMR5Sz5cCRcSSE8sSc1OTS1ILYLpY+LglGpg9Cm3tmw4asCy wV/o+YvvswN7rJ83RJ47/VL91s35+8scPa47LfhyXrXv2JWVyySOpT2rKzrFfpp/q/c+s7UH PucuaXb78tQxT/TZjXeqLCHSScdLTxkkHNfr+rgoZMOFusy6itlFPffUChkq82JsjbR49A7m C59n2Wjx+tX2zOX+v+Oqo3LClFiKMxINtZiLihMBcKeGF98CAAA= X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170327144317eucas1p128395d538d2f284cdd1e71f83823c965 X-Msg-Generator: CA X-Sender-IP: 182.198.249.179 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a References: <1487250070-13973-1-git-send-email-i.maximets@samsung.com> <50517d4c-5174-f4b2-e77e-143f7aac2c00@samsung.com> <077682cf-8534-7890-9453-7c9e822bd3e6@intel.com> Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Mar 2017 14:43:21 -0000 On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: > On 09/03/2017 12:57, Ilya Maximets wrote: >> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>> Hi Ilya, >>> >>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. >>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. >>> >>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. >>> >>> Currently at a high level regarding hugepages per numa node: >>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. >>> 2) Find out numa node of each hugepage. >>> 3) Check if we have enough hugepages for requested memory in each numa socket/node. >>> >>> Using libnuma we could try to allocate hugepages per numa: >>> 1) Try to map as many hugepages from numa 0. >>> 2) Check if we have enough hugepages for requested memory in numa 0. >>> 3) Try to map as many hugepages from numa 1. >>> 4) Check if we have enough hugepages for requested memory in numa 1. >>> >>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). >>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. >>> >>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. >>> >>> Thoughts? >> Hi Sergio, >> >> Thanks for your attention to this. >> >> For now, as we have some issues with non-contiguous >> hugepages, I'm thinking about following hybrid schema: >> 1) Allocate essential hugepages: >> 1.1) Allocate as many hugepages from numa N to >> only fit requested memory for this numa. >> 1.2) repeat 1.1 for all numa nodes. >> 2) Try to map all remaining free hugepages in a round-robin >> fashion like in this patch. >> 3) Sort pages and choose the most suitable. >> >> This solution should decrease number of issues connected with >> non-contiguous memory. > > Sorry for late reply, I was hoping for more comments from the community. > > IMHO this should be default behavior, which means no config option and libnuma as EAL dependency. > I think your proposal is good, could you consider implementing such approach on next release? Sure, I can implement this for 17.08 release. >> >>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>> Hi all. >>>> >>>> So, what about this change? >>>> >>>> Best regards, Ilya Maximets. >>>> >>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>> Currently EAL allocates hugepages one by one not paying >>>>> attention from which NUMA node allocation was done. >>>>> >>>>> Such behaviour leads to allocation failure if number of >>>>> available hugepages for application limited by cgroups >>>>> or hugetlbfs and memory requested not only from the first >>>>> socket. >>>>> >>>>> Example: >>>>> # 90 x 1GB hugepages availavle in a system >>>>> >>>>> cgcreate -g hugetlb:/test >>>>> # Limit to 32GB of hugepages >>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>> # Request 4GB from each of 2 sockets >>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>> >>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>> EAL: Not enough memory available on socket 1! >>>>> Requested: 4096MB, available: 0MB >>>>> PANIC in rte_eal_init(): >>>>> Cannot init memory >>>>> >>>>> This happens beacause all allocated pages are >>>>> on socket 0. >>>>> >>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>> In this case all allocated pages will be fairly distributed >>>>> between all requested nodes. >>>>> >>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>> introduced and disabled by default because of external >>>>> dependency from libnuma. >>>>> >>>>> Cc: >>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>> >>>>> Signed-off-by: Ilya Maximets >>>>> --- >>>>> config/common_base | 1 + >>>>> lib/librte_eal/Makefile | 4 ++ >>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ >>>>> mk/rte.app.mk | 3 ++ >>>>> 4 files changed, 74 insertions(+) > > Acked-by: Sergio Gonzalez Monroy Thanks. Best regards, Ilya Maximets.