Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
To: Ilya Maximets <i.maximets@samsung.com>,
	Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: dev@dpdk.org, David Marchand <david.marchand@6wind.com>,
	Heetae Ahn <heetae82.ahn@samsung.com>,
	Yuanhan Liu <yuanhan.liu@linux.intel.com>,
	Jianfeng Tan <jianfeng.tan@intel.com>,
	Neil Horman <nhorman@tuxdriver.com>,
	Yulong Pei <yulong.pei@intel.com>,
	stable@dpdk.org, Bruce Richardson <bruce.richardson@intel.com>
Subject: Re: [dpdk-stable] [PATCH] mem: balanced allocation of hugepages
Date: Mon, 10 Apr 2017 08:51:52 +0100	[thread overview]
Message-ID: <b9291962-ceb3-06e3-445d-5089fb0868c0@intel.com> (raw)
In-Reply-To: <b7ac2887-1cc4-aae7-8337-98a0f6c548a8@samsung.com>

On 10/04/2017 08:11, Ilya Maximets wrote:
> On 07.04.2017 18:44, Thomas Monjalon wrote:
>> 2017-04-07 18:14, Ilya Maximets:
>>> Hi All.
>>>
>>> I wanted to ask (just to clarify current status):
>>> Will this patch be included in current release (acked by maintainer)
>>> and then I will upgrade it to hybrid logic or I will just prepare v3
>>> with hybrid logic for 17.08 ?
>> What is your preferred option Ilya?
> I have no strong opinion on this. One thought is that it could be
> nice if someone else could test this functionality with current
> release before enabling it by default in 17.08.
>
> Tomorrow I'm going on vacation. So I'll post rebased version today
> (there are few fuzzes with current master) and you with Sergio may
> decide what to do.
>
> Best regards, Ilya Maximets.
>
>> Sergio?

I would be inclined towards v3 targeting v17.08. IMHO it would be more 
clean this way.

Sergio

>>
>>> On 27.03.2017 17:43, Ilya Maximets wrote:
>>>> On 27.03.2017 16:01, Sergio Gonzalez Monroy wrote:
>>>>> On 09/03/2017 12:57, Ilya Maximets wrote:
>>>>>> On 08.03.2017 16:46, Sergio Gonzalez Monroy wrote:
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> I have done similar tests and as you already pointed out, 'numactl --interleave' does not seem to work as expected.
>>>>>>> I have also checked that the issue can be reproduced with quota limit on hugetlbfs mount point.
>>>>>>>
>>>>>>> I would be inclined towards *adding libnuma as dependency* to DPDK to make memory allocation a bit more reliable.
>>>>>>>
>>>>>>> Currently at a high level regarding hugepages per numa node:
>>>>>>> 1) Try to map all free hugepages. The total number of mapped hugepages depends if there were any limits, such as cgroups or quota in mount point.
>>>>>>> 2) Find out numa node of each hugepage.
>>>>>>> 3) Check if we have enough hugepages for requested memory in each numa socket/node.
>>>>>>>
>>>>>>> Using libnuma we could try to allocate hugepages per numa:
>>>>>>> 1) Try to map as many hugepages from numa 0.
>>>>>>> 2) Check if we have enough hugepages for requested memory in numa 0.
>>>>>>> 3) Try to map as many hugepages from numa 1.
>>>>>>> 4) Check if we have enough hugepages for requested memory in numa 1.
>>>>>>>
>>>>>>> This approach would improve failing scenarios caused by limits but It would still not fix issues regarding non-contiguous hugepages (worst case each hugepage is a memseg).
>>>>>>> The non-contiguous hugepages issues are not as critical now that mempools can span over multiple memsegs/hugepages, but it is still a problem for any other library requiring big chunks of memory.
>>>>>>>
>>>>>>> Potentially if we were to add an option such as 'iommu-only' when all devices are bound to vfio-pci, we could have a reliable way to allocate hugepages by just requesting the number of pages from each numa.
>>>>>>>
>>>>>>> Thoughts?
>>>>>> Hi Sergio,
>>>>>>
>>>>>> Thanks for your attention to this.
>>>>>>
>>>>>> For now, as we have some issues with non-contiguous
>>>>>> hugepages, I'm thinking about following hybrid schema:
>>>>>> 1) Allocate essential hugepages:
>>>>>>      1.1) Allocate as many hugepages from numa N to
>>>>>>           only fit requested memory for this numa.
>>>>>>      1.2) repeat 1.1 for all numa nodes.
>>>>>> 2) Try to map all remaining free hugepages in a round-robin
>>>>>>      fashion like in this patch.
>>>>>> 3) Sort pages and choose the most suitable.
>>>>>>
>>>>>> This solution should decrease number of issues connected with
>>>>>> non-contiguous memory.
>>>>> Sorry for late reply, I was hoping for more comments from the community.
>>>>>
>>>>> IMHO this should be default behavior, which means no config option and libnuma as EAL dependency.
>>>>> I think your proposal is good, could you consider implementing such approach on next release?
>>>> Sure, I can implement this for 17.08 release.
>>>>
>>>>>>> On 06/03/2017 09:34, Ilya Maximets wrote:
>>>>>>>> Hi all.
>>>>>>>>
>>>>>>>> So, what about this change?
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>
>>>>>>>> On 16.02.2017 16:01, Ilya Maximets wrote:
>>>>>>>>> Currently EAL allocates hugepages one by one not paying
>>>>>>>>> attention from which NUMA node allocation was done.
>>>>>>>>>
>>>>>>>>> Such behaviour leads to allocation failure if number of
>>>>>>>>> available hugepages for application limited by cgroups
>>>>>>>>> or hugetlbfs and memory requested not only from the first
>>>>>>>>> socket.
>>>>>>>>>
>>>>>>>>> Example:
>>>>>>>>>       # 90 x 1GB hugepages availavle in a system
>>>>>>>>>
>>>>>>>>>       cgcreate -g hugetlb:/test
>>>>>>>>>       # Limit to 32GB of hugepages
>>>>>>>>>       cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
>>>>>>>>>       # Request 4GB from each of 2 sockets
>>>>>>>>>       cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>>>>>>>>>
>>>>>>>>>       EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
>>>>>>>>>       EAL: 32 not 90 hugepages of size 1024 MB allocated
>>>>>>>>>       EAL: Not enough memory available on socket 1!
>>>>>>>>>            Requested: 4096MB, available: 0MB
>>>>>>>>>       PANIC in rte_eal_init():
>>>>>>>>>       Cannot init memory
>>>>>>>>>
>>>>>>>>>       This happens beacause all allocated pages are
>>>>>>>>>       on socket 0.
>>>>>>>>>
>>>>>>>>> Fix this issue by setting mempolicy MPOL_PREFERRED for each
>>>>>>>>> hugepage to one of requested nodes in a round-robin fashion.
>>>>>>>>> In this case all allocated pages will be fairly distributed
>>>>>>>>> between all requested nodes.
>>>>>>>>>
>>>>>>>>> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
>>>>>>>>> introduced and disabled by default because of external
>>>>>>>>> dependency from libnuma.
>>>>>>>>>
>>>>>>>>> Cc:<stable@dpdk.org>
>>>>>>>>> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ilya Maximets<i.maximets@samsung.com>
>>>>>>>>> ---
>>>>>>>>>     config/common_base                       |  1 +
>>>>>>>>>     lib/librte_eal/Makefile                  |  4 ++
>>>>>>>>>     lib/librte_eal/linuxapp/eal/eal_memory.c | 66 ++++++++++++++++++++++++++++++++
>>>>>>>>>     mk/rte.app.mk                            |  3 ++
>>>>>>>>>     4 files changed, 74 insertions(+)
>>>>> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
>>>> Thanks.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>
>>
>>
>>

next prev parent reply	other threads:[~2017-04-10  7:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
2017-02-16 13:01 ` Ilya Maximets
2017-02-16 13:26   ` Tan, Jianfeng
2017-02-16 13:55     ` Ilya Maximets
2017-02-16 13:57       ` Ilya Maximets
2017-02-16 13:31   ` [dpdk-stable] [dpdk-dev] " Bruce Richardson
2017-03-06  9:34   ` [dpdk-stable] " Ilya Maximets
2017-03-08 13:46     ` Sergio Gonzalez Monroy
2017-03-09 12:57       ` Ilya Maximets
2017-03-27 13:01         ` Sergio Gonzalez Monroy
2017-03-27 14:43           ` Ilya Maximets
2017-04-07 15:14             ` Ilya Maximets
2017-04-07 15:44               ` Thomas Monjalon
2017-04-10  7:11                 ` Ilya Maximets
2017-04-10  7:51                   ` Sergio Gonzalez Monroy [this message]
2017-04-10  8:05                     ` Ilya Maximets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9291962-ceb3-06e3-445d-5089fb0868c0@intel.com \
    --to=sergio.gonzalez.monroy@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@6wind.com \
    --cc=dev@dpdk.org \
    --cc=heetae82.ahn@samsung.com \
    --cc=i.maximets@samsung.com \
    --cc=jianfeng.tan@intel.com \
    --cc=nhorman@tuxdriver.com \
    --cc=stable@dpdk.org \
    --cc=thomas.monjalon@6wind.com \
    --cc=yuanhan.liu@linux.intel.com \
    --cc=yulong.pei@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).