DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Asaf Sinai <AsafSi@Radware.com>, "dev@dpdk.org" <dev@dpdk.org>,
	Ilya Maximets <i.maximets@samsung.com>,
	Thomas Monjalon <thomas@monjalon.net>
Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration
Date: Mon, 26 Nov 2018 12:50:41 +0000	[thread overview]
Message-ID: <12283bd1-ea0d-38d1-f64d-508596e48cd9@intel.com> (raw)
In-Reply-To: <518f9333-8d80-0fa2-d391-b4c8df181508@intel.com>

On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>> Hi Anatoly,
>>
>> We did not check it with "testpmd", only with our application.
>>  From the beginning, we did not enable this configuration (look at 
>> attached files), and everything works fine.
>> Of course we rebuild DPDK, when we change configuration.
>> Please note that we use DPDK 17.11.3, maybe this is why it works fine?
> 
> Just tested with DPDK 17.11, and yes, it does work the way you are 
> describing. This is not intended behavior. I will look into it.
> 

+CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.

Looking at the code, i think this config option needs to be reworked and 
we should clarify what we mean by this option. It appears that i've 
misunderstood what this option actually intended to do, and i also think 
it's naming could be improved because it's confusing and misleading.

In 17.11, this option does *not* prevent EAL from using NUMA - it merely 
disables using libnuma to perform memory allocation. This looks like 
intended (if counter-intuitive) behavior - disabling this option will 
simply revert DPDK to working as it did before this option was 
introduced (i.e. best-effort allocation). This is why your code still 
works - because EAL still does allocate memory on socket 1, and *knows* 
that it's socket 1 memory. It still supports NUMA.

The commit message for these changes states that the actual purpose of 
this option is to enable "balanced" hugepage allocation. In case of 
cgroups limitations, previously, DPDK would've exhausted all hugepages 
on master core's socket before attempting to allocate from other 
sockets, but by the time we've reached cgroups limits on numbers of 
hugepages, we might not have reached socket 1 and thus missed out on the 
pages we could've allocated, but didn't. Using libnuma solves this 
issue, because now we can allocate pages on sockets we want, instead of 
hoping we won't run out of hugepages before we get the memory we need.

In 18.05 onwards, this option works differently (and arguably wrong). 
More specifically, it disallows allocations on sockets other than 0, and 
it also makes it so that EAL does not check which socket the memory 
*actually* came from. So, not only allocating memory from socket 1 is 
disabled, but allocating from socket 0 may even get you memory from 
socket 1!

+CC Thomas

The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it 
makes it seem like this option disables NUMA support, which is not the case.

I would also argue that it is not relevant to 18.05+ memory subsystem, 
and should only work in legacy mode, because it is *impossible* to make 
it work right in the new memory subsystem, and here's why:

Without libnuma, we have no way of "asking" the kernel to allocate a 
hugepage on a specific socket - instead, any allocation will most likely 
happen on socket from which the allocation came from. For example, if 
user program's lcore is on socket 1, allocation on socket 0 will 
actually allocate a page on socket 1.

If we don't check for page's NUMA node affinity (which is what currently 
happens) - we get performance degradation because we may unintentionally 
allocate memory on wrong NUMA node. If we do check for this - then 
allocation of memory on socket 1 from lcore on socket 0 will almost 
never succeed, because kernel will always give us pages on socket 0.

Put it simply, there is no sane way to make this option work for the new 
memory subsystem - IMO it should be dropped, and libnuma should be made 
a hard dependency on Linux.

-- 
Thanks,
Anatoly

  reply	other threads:[~2018-11-26 12:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26  9:15 Asaf Sinai
2018-11-26 11:09 ` Burakov, Anatoly
2018-11-26 11:33   ` Asaf Sinai
2018-11-26 11:43     ` Burakov, Anatoly
2018-11-26 12:50       ` Burakov, Anatoly [this message]
2018-11-26 13:16         ` Ilya Maximets
2018-11-26 13:20           ` Ilya Maximets
2018-11-26 13:42             ` Burakov, Anatoly
2018-11-26 14:10               ` Ilya Maximets
2018-11-26 14:21                 ` Burakov, Anatoly
2018-11-26 14:32                   ` Ilya Maximets
2018-11-26 14:57                     ` Burakov, Anatoly
2018-11-26 15:25                       ` Asaf Sinai
2018-11-27 10:26                         ` Hemant Agrawal
2018-11-27 10:33                           ` Burakov, Anatoly
2018-11-27 16:49                             ` Ilya Maximets
2018-12-09  8:14                               ` Asaf Sinai
2018-12-10 10:09                                 ` Burakov, Anatoly
2018-12-16  9:44                                   ` Asaf Sinai
     [not found]     ` <CGME20181126122321eucas1p1c8bfe7e1b74fc5cd71eec3a3c8929f5d@eucas1p1.samsung.com>
2018-11-26 12:23       ` [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations Ilya Maximets
2018-11-26 12:46         ` Ilya Maximets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12283bd1-ea0d-38d1-f64d-508596e48cd9@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=AsafSi@Radware.com \
    --cc=dev@dpdk.org \
    --cc=i.maximets@samsung.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).