DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
Subject: Re: [dpdk-dev] Huge mapping secondary process linux
Date: Fri, 27 Oct 2017 21:22:41 +0200	[thread overview]
Message-ID: <OFADF97DED.D3F54CF8-ON002581C6.00697F36-C12581C6.006A71ED@notes.na.collabserv.com> (raw)
In-Reply-To: <ef4c2e4b-f94d-9f32-db1f-5ab31f43661d@intel.com>

"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 27/10/2017 
18:00:27:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
> Date: 27/10/2017 18:00
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> 
> On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > "dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> > 
> >  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> >  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, 
dev@dpdk.org
> >  > Date: 10/27/2017 04:58 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > Sent by: "dev" <dev-bounces@dpdk.org>
> >  >
> >  >
> >  > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
> > 04:44:52
> >  > PM:
> >  >
> >  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> >  > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, 
> > dev@dpdk.org
> >  > > Date: 10/27/2017 04:45 PM
> >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > >
> >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> >  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 
10/27/2017
> >  > > > 04:06:44 PM:
> >  > > >
> >  > > > Â > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > > > Â > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> >  > > > Â > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> >  > > > Â > Date: 10/27/2017 04:06 PM
> >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process 
linux
> >  > > > Â >
> >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > > > Â > >
> >  > > > Â > >
> >  > > > Â > > Hi @all,
> >  > > > Â > >
> >  > > > Â > > I'm trying to make sense of the hugepage memory mappings 
in
> >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to 
do a
> >  > private
> >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use 
MAP_FIXED 
> > if we
> >  >
> >  > > > need the
> >  > > > Â > > addresses to be identical with the primary process?
> >  > > > Â > > * On POWER we have this weird business going on where we 
use
> >  > > > MAP_HUGETLB
> >  > > > Â > > because according to this commit:
> >  > > > Â > >
> >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > > Â > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero 
file to
> >  > hugepage
> >  > > > memory
> >  > > > Â > > Â  Â  Â space, mmap will not respect the requested 
address 
> > hint.This
> >  > will
> >  > > > Â > > cause
> >  > > > Â > > Â  Â  Â the memory initialization for the second 
> process fails. 
> > This
> >  > > > patch adds
> >  > > > Â > > Â  Â  Â the required mmap flags to make it work. 
> Beside this, users
> >  > > > need to set
> >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA 
> range. When
> >  > > > Â > > Â  Â  Â doing the initialization, users need to set both 
> > nr_hugepages
> >  > and
> >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like 
64, 
> > 128, etc.
> >  > > > Â > >
> >  > > > Â > > mmap address hints are not respected. Looking at the mmap 

> > code in
> >  > the
> >  > > > Â > > kernel this is not true entirely however under some 
> > circumstances
> >  > > > the hint
> >  > > > Â > > can be ignored (
> >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> >  > > > Â >
> >  > > >
> >  > >
> >  > 
> > 
> 
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
> >  >
> >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> >  > > > Â >
> >  > 
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> >  > > > Â > > ). However I believe we can remove the extra case forPPC 
if we
> >  > use
> >  > > > Â > > MAP_FIXED when doing the secondary process mappings 
because we
> >  > need
> >  > > > them to
> >  > > > Â > > be identical anyway. We could also use MAP_FIXED 
whendoing the
> >  > primary
> >  > > > Â > > process mappings resp. get_virtual_area if we want to 
have any
> >  > > > guarantees
> >  > > > Â > > when specifying a base address. Any thoughts?
> >  > > > Â > >
> >  > > > Â > > Thanks,
> >  > > > Â > > Jonas
> >  > > > Â > >
> >  > > > Â > hi Jonas,
> >  > > > Â >
> >  > > > Â > MAP_FIXED is not used because it's dangerous, it unmaps 
anything
> >  > that is
> >  > > > Â > already mapped into that space. We would rather know 
> that we can't
> >  > map
> >  > > > Â > something than unwittingly unmap something that was 
> mapped before.
> >  > > >
> >  > > > Ok, I see. Maybe we can add a check to the primary process's 
memory
> >  > > > mappings whether the hint has been respected or not? At least 
warn if
> >  > it
> >  > > > hasn't.
> >  > >
> >  > > Hi Jonas,
> >  > >
> >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd have to 
explain
> >  > > a bit more what you mean by "hint has been respected" :)
> >  >
> >  > Hi Anatoly,
> >  >
> >  > What I meant was the mmap address hint:
> >  >
> >  > "If addr is not NULL, then the kernel takes it as a hint
> >  > Â about where to place the mapping; on Linux, the mapping will be
> >  > Â created at a nearby page boundary."
> >  >
> >  > This is actually not true on POWER. It can happen that the address 
> > hint is
> >  > ignored and you get any address back that fits your mapping.
> >  >
> >  > Thanks,
> >  > Jonas
> > 
> > Actually looking through the kernel code this is also not guaranteed 
on x86.
> > (https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-
> g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> 
pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-
> jsnNP-hAcW487Mumv6xPw&e=)
> > 
> > So in any case the address hint can be ignored by the kernel and you 
get 
> > any address that fits your mapping.
> > My suggestion is to check when we do the initial mapping in 
> > get_virtual_area if the hint was respected or not, i.e. if the 
returned 
> > address == PAGE_ALIGN(address_hint).
> > 
> 
> I'm not sure i see the issue here. So, just to make sure i understand 
> things correctly:
> 
> Whenever we don't request a specific base address through base_address 
> EAL parameter, none of this matters - we always ask for memory in 
> arbitrary memory locations, correct?
> 
> It's also not an issue with secondary processes because we do check 
> returned mmap address to see whether it's the same as we requested, 
correct?
> 
> It's only whenever we *do* specify a base_address, we provide an address 

> hint to mmap to, but we don't check if the address we got from mmap is 
> one in the vicinity of our requested base address, correct? We don't 
> check, and the kernel can ignore address hint, so we're not guaranteed 
> to respect the base_address flag.
> 
> I'm not sure this is a serious issue, because as far as i'm concerned, 
> this flag is advisory - we only promise to *attempt* to map things at 
> that particular address, not that it will succeed. If the kernel simply 
> cannot find an address to satisfy our address hint, or ignores it for 
> other reasons - well, tough, nothing we can do about that. I'm not sure 
> putting a check like this, where we can't even predict an "expected" 
> address is a good idea.
> 
> Am i getting this right?

The problem is when we specify a base address we want it to be used. If it 
is
not respected we basically end up with the case like we would have never 
specified it.
This very likely leads to not being able to run a secondary process 
because
we will not be able to map the addresses from our primary process and that 
is why we
introduced the base address parameter in the first place.

> 
> -- 
> Thanks,
> Anatoly
> 

  reply	other threads:[~2017-10-27 19:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-27 12:43 Jonas Pfefferle1
2017-10-27 14:06 ` Burakov, Anatoly
2017-10-27 14:28   ` Jonas Pfefferle1
2017-10-27 14:44     ` Burakov, Anatoly
2017-10-27 14:58       ` Jonas Pfefferle1
2017-10-27 15:16         ` Jonas Pfefferle1
2017-10-27 16:00           ` Burakov, Anatoly
2017-10-27 19:22             ` Jonas Pfefferle1 [this message]
2017-11-07  8:25               ` Chao Zhu
2017-11-07 10:15                 ` Jonas Pfefferle1
2017-11-09  3:08                   ` Chao Zhu
2017-11-09  9:54                     ` Jonas Pfefferle1
2017-10-27 15:48       ` Tan, Jianfeng
2017-10-27 16:06         ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OFADF97DED.D3F54CF8-ON002581C6.00697F36-C12581C6.006A71ED@notes.na.collabserv.com \
    --to=jpf@zurich.ibm.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).