From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com [210.118.77.12]) by dpdk.org (Postfix) with ESMTP id 4B45B2BAE; Fri, 7 Apr 2017 17:14:47 +0200 (CEST) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout2.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OO10042XP0M2D30@mailout2.w1.samsung.com>; Fri, 07 Apr 2017 16:14:46 +0100 (BST) Received: from eusmges5.samsung.com (unknown [203.254.199.245]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170407151445eucas1p1efa5e628f5b5acaa9422bc68f69ce649~zJl272zz61693916939eucas1p1t; Fri, 7 Apr 2017 15:14:45 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges5.samsung.com (EUCPMTA) with SMTP id CA.29.25577.5ECA7E85; Fri, 7 Apr 2017 16:14:45 +0100 (BST) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20170407151444eucas1p1ab4d6ca11a7c8b0034be5ee1dc92bd4a~zJl2QVVyq2094820948eucas1p1v; Fri, 7 Apr 2017 15:14:44 +0000 (GMT) X-AuditID: cbfec7f5-f792f6d0000063e9-a8-58e7ace58a23 Received: from eusync4.samsung.com ( [203.254.199.214]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id B1.64.17452.D5DA7E85; Fri, 7 Apr 2017 16:16:45 +0100 (BST) Received: from [106.109.129.180] by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OO100IX7P0JP030@eusync4.samsung.com>; Fri, 07 Apr 2017 16:14:44 +0100 (BST) To: Sergio Gonzalez Monroy , dev@dpdk.org, David Marchand , Thomas Monjalon Cc: Heetae Ahn , Yuanhan Liu , Jianfeng Tan , Neil Horman , Yulong Pei , stable@dpdk.org, Bruce Richardson From: Ilya Maximets Message-id: <2a9b03bd-a4c0-8d20-0bbd-77730140eef0@samsung.com> Date: Fri, 07 Apr 2017 18:14:43 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-version: 1.0 In-reply-to: Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA01SbUhTYRjlvXe7u5surpvVwyykhRFWamRxkdSkgv0sf4kEufSmw4/J5iSF wjC/ZurMyjRQwUybUmlDp2TptImT0kzN+UXWxDSsUEuna6S7E/x3znPOw/Oew0viomauhFSk pDGqFHmSlBBwWs32DyfmmuajgsxGMT2uD6cb8vM49M/lNox+uDLJo0fy7Dy68PEqQU9k9xN0 g64X0c58B49ebSkn6M+6IS7998safs5T9rFkCsk2auq4strXC5isqv+yrNigR7KRtUZ0iYgW nI1jkhTpjCowLEaQcNs5RKSu+t+YKrVzs9DaIS3ik0AFQ0dFIY/F+2Bo5gWhRQJSRNUhWC7e dJMVBOUTbcTOxr+FZpwVniL4ujHrJt8R9HQ2ulxiKgRaB4qxbexN1SNw2AK3TThVgMFIValL IKjjYGnsRdtYSIVBx2Cna5lD+YFzedI130tFgXH8LsF6vGC9bIazjflUOAw537g8OBUEj+73 YCz2hVdNS64XATXGg+dlz7haRG6Rg9DShbMRLkDO0gyXxWJY7DO4CzgABfndGLubjSBLP4xY okOwml2Psa5wsFhH3df2wL3Wcpw9IIT8XBFrkYHpZZ27rwj4NljEYyuqxuF9yTCuQ76VuwJV 7gpRuStEDcL1yJvRqJPjGfXpALU8Wa1JiQ+IVSa3oK2PNODs+2NEdeYQE6JIJPUUrmvno0Rc ebo6I9mEgMSl3kL7g62RME6ekcmolFdVmiRGbUI+JEe6XyiwjEWJqHh5GpPIMKmMakfFSL4k C1mJUwtxwTfN6z7R/WGJVV1Tflq9RrkZ3TzGLWyx9q0cq872nJ6/4wj1On8k8JaX0QMUHtex Txct64FHde3XQiNstbOHqSvvIoOVGp2DL5ZkLiokliKbyWBTGH4rA/h5oxVnuiNt1hzDk+n6 mkjRj25+zFxJ+y8bJza3+K1RylEnyE/64yq1/D9OaefVRAMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrPIsWRmVeSWpSXmKPExsVy+t/xa7qxa59HGKw/zmhxY5W9xYqOdhaL d5+2M1lM+3yb3eJK+092i+7ZX9gsbjWfZLNYMeEIo8W/jj/sFl82TWezuD7hAqvFtwffmR14 PC7232H0+LVgKavH4j0vmTzmnQz06NuyitHjyvfVjAFsUW42GamJKalFCql5yfkpmXnptkqh IW66FkoKeYm5qbZKEbq+IUFKCmWJOaVAnpEBGnBwDnAPVtK3S3DLaPx3ga3gi1bFnYk/WRsY vyt2MXJySAiYSPx9uZEZwhaTuHBvPVsXIxeHkMASRonPh2czQzgvGCUWrl8HViUsYCWx7XQf E0hCRGA5o0Triq+sEFULmSV+zD8NlmEW6GaSmLj8FRNIC5uAjsSp1UcYQWxeATuJXef3soHY LAKqEv8+3QaLiwpESMx/uooJokZQ4sfkeywgNqeAvcSFf/uAajiAhupJ3L+oBRJmFpCX2Lzm LfMERoFZSDpmIVTNQlK1gJF5FaNIamlxbnpusaFecWJucWleul5yfu4mRmBkbjv2c/MOxksb gw8xCnAwKvHwBvQ+jxBiTSwrrsw9xCjBwawkwvtzKlCINyWxsiq1KD++qDQntfgQoynQCxOZ pUST84FJI68k3tDE0NzS0MjYwsLcyEhJnLfkw5VwIYH0xJLU7NTUgtQimD4mDk6pBkbJx3Mf NW83X5D+6PmZuncn8lckRNzxqjASMBXfI/Ru1zJTvcmXGHz8srQ6Nid/Tmud8fy4o276fZ4Q xXUMh8+v7a9qOp2uO1258tfu2e3zfrVe+HiXd0b3xYzPHvyPllYaF3Q4TVszK/TJO8nT1cfP z+bRsGFZ2Zds7HAh/TPX2vX+R+KuuFkosRRnJBpqMRcVJwIAjn7HB+ICAAA= X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20170407151444eucas1p1ab4d6ca11a7c8b0034be5ee1dc92bd4a X-Msg-Generator: CA X-Sender-IP: 182.198.249.179 X-Local-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G+yCvOyEseyghOyekBtMZWFkaW5nIEVuZ2luZWVy?= X-Global-Sender: =?UTF-8?B?SWx5YSBNYXhpbWV0cxtTUlItVmlydHVhbGl6YXRpb24gTGFi?= =?UTF-8?B?G1NhbXN1bmcgRWxlY3Ryb25pY3MbTGVhZGluZyBFbmdpbmVlcg==?= X-Sender-Code: =?UTF-8?B?QzEwG0NJU0hRG0MxMEdEMDFHRDAxMDE1NA==?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a X-RootMTR: 20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a References: <1487250070-13973-1-git-send-email-i.maximets@samsung.com> <50517d4c-5174-f4b2-e77e-143f7aac2c00@samsung.com> <077682cf-8534-7890-9453-7c9e822bd3e6@intel.com> Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Apr 2017 15:14:48 -0000 Hi All. I wanted to ask (just to clarify current status): Will this patch be included in current release (acked by maintainer) and then I will upgrade it to hybrid logic or I will just prepare v3 with hybrid logic for 17.08 ? Best regards, Ilya Maximets. On 27.03.2017 17:43, Ilya Maximets wrote: > On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote: >> On 09/03/2017 12:57, Ilya Maximets wrote: >>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote: >>>> Hi Ilya, >>>> >>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected. >>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point. >>>> >>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable. >>>> >>>> Currently at a high level regarding hugepages per numa node: >>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point. >>>> 2) Find out numa node of each hugepage. >>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node. >>>> >>>> Using libnuma we could try to allocate hugepages per numa: >>>> 1) Try to map as many hugepages from numa 0. >>>> 2) Check if we have enough hugepages for requested memory in numa 0. >>>> 3) Try to map as many hugepages from numa 1. >>>> 4) Check if we have enough hugepages for requested memory in numa 1. >>>> >>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg). >>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory. >>>> >>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa. >>>> >>>> Thoughts? >>> Hi Sergio, >>> >>> Thanks for your attention to this. >>> >>> For now, as we have some issues with non-contiguous >>> hugepages, I'm thinking about following hybrid schema: >>> 1) Allocate essential hugepages: >>> 1.1) Allocate as many hugepages from numa N to >>> only fit requested memory for this numa. >>> 1.2) repeat 1.1 for all numa nodes. >>> 2) Try to map all remaining free hugepages in a round-robin >>> fashion like in this patch. >>> 3) Sort pages and choose the most suitable. >>> >>> This solution should decrease number of issues connected with >>> non-contiguous memory. >> >> Sorry for late reply, I was hoping for more comments from the community. >> >> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency. >> I think your proposal is good, could you consider implementing such approach on next release? > > Sure, I can implement this for 17.08 release. > >>> >>>> On 06/03/2017 09:34, Ilya Maximets wrote: >>>>> Hi all. >>>>> >>>>> So, what about this change? >>>>> >>>>> Best regards, Ilya Maximets. >>>>> >>>>> On 16.02.2017 16:01, Ilya Maximets wrote: >>>>>> Currently EAL allocates hugepages one by one not paying >>>>>> attention from which NUMA node allocation was done. >>>>>> >>>>>> Such behaviour leads to allocation failure if number of >>>>>> available hugepages for application limited by cgroups >>>>>> or hugetlbfs and memory requested not only from the first >>>>>> socket. >>>>>> >>>>>> Example: >>>>>> # 90 x 1GB hugepages availavle in a system >>>>>> >>>>>> cgcreate -g hugetlb:/test >>>>>> # Limit to 32GB of hugepages >>>>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test >>>>>> # Request 4GB from each of 2 sockets >>>>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ... >>>>>> >>>>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB >>>>>> EAL: 32 not 90 hugepages of size 1024 MB allocated >>>>>> EAL: Not enough memory available on socket 1! >>>>>> Requested: 4096MB, available: 0MB >>>>>> PANIC in rte_eal_init(): >>>>>> Cannot init memory >>>>>> >>>>>> This happens beacause all allocated pages are >>>>>> on socket 0. >>>>>> >>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each >>>>>> hugepage to one of requested nodes in a round-robin fashion. >>>>>> In this case all allocated pages will be fairly distributed >>>>>> between all requested nodes. >>>>>> >>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES >>>>>> introduced and disabled by default because of external >>>>>> dependency from libnuma. >>>>>> >>>>>> Cc: >>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages") >>>>>> >>>>>> Signed-off-by: Ilya Maximets >>>>>> --- >>>>>> config/common_base | 1 + >>>>>> lib/librte_eal/Makefile | 4 ++ >>>>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++ >>>>>> mk/rte.app.mk | 3 ++ >>>>>> 4 files changed, 74 insertions(+) >> >> Acked-by: Sergio Gonzalez Monroy > > Thanks. > > Best regards, Ilya Maximets. >