From: Ilya Maximets <i.maximets@samsung.com>
To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>,
dev@dpdk.org, David Marchand <david.marchand@6wind.com>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>,
Yuanhan Liu <yuanhan.liu@linux.intel.com>,
Jianfeng Tan <jianfeng.tan@intel.com>,
Neil Horman <nhorman@tuxdriver.com>,
Yulong Pei <yulong.pei@intel.com>,
stable@dpdk.org, Thomas Monjalon <thomas.monjalon@6wind.com>,
Bruce Richardson <bruce.richardson@intel.com>
Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages
Date: Thu, 09 Mar 2017 15:57:24 +0300 [thread overview]
Message-ID: <aca5b73b-75d9-da12-26f3-67ff6fe218ac@samsung.com> (raw)
In-Reply-To: <f50d0fa1-9530-436c-d532-0e6123f4e06d@intel.com>
On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote:
> Hi Ilya,
>
> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected.
> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point.
>
> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable.
>
> Currently at a high level regarding hugepages per numa node:
> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point.
> 2) Find out numa node of each hugepage.
> 3) Check if we have enough hugepages for requested memory in each numa socket/node.
>
> Using libnuma we could try to allocate hugepages per numa:
> 1) Try to map as many hugepages from numa 0.
> 2) Check if we have enough hugepages for requested memory in numa 0.
> 3) Try to map as many hugepages from numa 1.
> 4) Check if we have enough hugepages for requested memory in numa 1.
>
> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg).
> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory.
>
> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa.
>
> Thoughts?
Hi Sergio,
Thanks for your attention to this.
For now, as we have some issues with non-contiguous
hugepages, I'm thinking about following hybrid schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion like in this patch.
3) Sort pages and choose the most suitable.
This solution should decrease number of issues connected with
non-contiguous memory.
Best regards, Ilya Maximets.
>
> On 06/03/2017 09:34, Ilya Maximets wrote:
>> Hi all.
>>
>> So, what about this change?
>>
>> Best regards, Ilya Maximets.
>>
>> On 16.02.2017 16:01, Ilya Maximets wrote:
>>> Currently EAL allocates hugepages one by one not paying
>>> attention from which NUMA node allocation was done.
>>>
>>> Such behaviour leads to allocation failure if number of
>>> available hugepages for application limited by cgroups
>>> or hugetlbfs and memory requested not only from the first
>>> socket.
>>>
>>> Example:
>>> # 90 x 1GB hugepages availavle in a system
>>>
>>> cgcreate -g hugetlb:/test
>>> # Limit to 32GB of hugepages
>>> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
>>> # Request 4GB from each of 2 sockets
>>> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>>>
>>> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
>>> EAL: 32 not 90 hugepages of size 1024 MB allocated
>>> EAL: Not enough memory available on socket 1!
>>> Requested: 4096MB, available: 0MB
>>> PANIC in rte_eal_init():
>>> Cannot init memory
>>>
>>> This happens beacause all allocated pages are
>>> on socket 0.
>>>
>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each
>>> hugepage to one of requested nodes in a round-robin fashion.
>>> In this case all allocated pages will be fairly distributed
>>> between all requested nodes.
>>>
>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
>>> introduced and disabled by default because of external
>>> dependency from libnuma.
>>>
>>> Cc: <stable@dpdk.org>
>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>> ---
>>> config/common_base | 1 +
>>> lib/librte_eal/Makefile | 4 ++
>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++
>>> mk/rte.app.mk | 3 ++
>>> 4 files changed, 74 insertions(+)
>>>
>
>
>
>
next prev parent reply other threads:[~2017-03-09 12:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
2017-02-16 13:01 ` Ilya Maximets
2017-02-16 13:26 ` Tan, Jianfeng
2017-02-16 13:55 ` Ilya Maximets
2017-02-16 13:57 ` Ilya Maximets
2017-02-16 13:31 ` [dpdk-stable] [dpdk-dev] " Bruce Richardson
2017-03-06 9:34 ` [dpdk-stable] " Ilya Maximets
2017-03-08 13:46 ` Sergio Gonzalez Monroy
2017-03-09 12:57 ` Ilya Maximets [this message]
2017-03-27 13:01 ` Sergio Gonzalez Monroy
2017-03-27 14:43 ` Ilya Maximets
2017-04-07 15:14 ` Ilya Maximets
2017-04-07 15:44 ` Thomas Monjalon
2017-04-10 7:11 ` Ilya Maximets
2017-04-10 7:51 ` Sergio Gonzalez Monroy
2017-04-10 8:05 ` Ilya Maximets
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aca5b73b-75d9-da12-26f3-67ff6fe218ac@samsung.com \
--to=i.maximets@samsung.com \
--cc=bruce.richardson@intel.com \
--cc=david.marchand@6wind.com \
--cc=dev@dpdk.org \
--cc=heetae82.ahn@samsung.com \
--cc=jianfeng.tan@intel.com \
--cc=nhorman@tuxdriver.com \
--cc=sergio.gonzalez.monroy@intel.com \
--cc=stable@dpdk.org \
--cc=thomas.monjalon@6wind.com \
--cc=yuanhan.liu@linux.intel.com \
--cc=yulong.pei@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).