DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Huge mapping secondary process linux
@ 2017-10-27 12:43 Jonas Pfefferle1
  2017-10-27 14:06 ` Burakov, Anatoly
  0 siblings, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-27 12:43 UTC (permalink / raw)
  To: dev; +Cc: chaozhu, bruce.richardson



Hi @all,

I'm trying to make sense of the hugepage memory mappings in
librte_eal/linuxapp/eal/eal_memory.c:
* In rte_eal_hugepage_attach (line 1347) when we try to do a private
mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we need the
addresses to be identical with the primary process?
* On POWER we have this weird business going on where we use MAP_HUGETLB
because according to this commit:

commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Date:   Thu Apr 6 15:36:09 2017 +0530

    eal/ppc: fix mmap for memory initialization

    On IBM POWER platform, when mapping /dev/zero file to hugepage memory
    space, mmap will not respect the requested address hint. This will
cause
    the memory initialization for the second process fails. This patch adds
    the required mmap flags to make it work. Beside this, users need to set
    the nr_overcommit_hugepages to expand the VA range. When
    doing the initialization, users need to set both nr_hugepages and
    nr_overcommit_hugepages to the same value, like 64, 128, etc.

mmap address hints are not respected. Looking at the mmap code in the
kernel this is not true entirely however under some circumstances the hint
can be ignored (
http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L103
). However I believe we can remove the extra case for PPC if we use
MAP_FIXED when doing the secondary process mappings because we need them to
be identical anyway. We could also use MAP_FIXED when doing the primary
process mappings resp. get_virtual_area if we want to have any guarantees
when specifying a base address. Any thoughts?

Thanks,
Jonas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 12:43 [dpdk-dev] Huge mapping secondary process linux Jonas Pfefferle1
@ 2017-10-27 14:06 ` Burakov, Anatoly
  2017-10-27 14:28   ` Jonas Pfefferle1
  0 siblings, 1 reply; 14+ messages in thread
From: Burakov, Anatoly @ 2017-10-27 14:06 UTC (permalink / raw)
  To: Jonas Pfefferle1, dev; +Cc: chaozhu, bruce.richardson

On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> 
> 
> Hi @all,
> 
> I'm trying to make sense of the hugepage memory mappings in
> librte_eal/linuxapp/eal/eal_memory.c:
> * In rte_eal_hugepage_attach (line 1347) when we try to do a private
> mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we need the
> addresses to be identical with the primary process?
> * On POWER we have this weird business going on where we use MAP_HUGETLB
> because according to this commit:
> 
> commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> Date:   Thu Apr 6 15:36:09 2017 +0530
> 
>      eal/ppc: fix mmap for memory initialization
> 
>      On IBM POWER platform, when mapping /dev/zero file to hugepage memory
>      space, mmap will not respect the requested address hint. This will
> cause
>      the memory initialization for the second process fails. This patch adds
>      the required mmap flags to make it work. Beside this, users need to set
>      the nr_overcommit_hugepages to expand the VA range. When
>      doing the initialization, users need to set both nr_hugepages and
>      nr_overcommit_hugepages to the same value, like 64, 128, etc.
> 
> mmap address hints are not respected. Looking at the mmap code in the
> kernel this is not true entirely however under some circumstances the hint
> can be ignored (
> http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L103
> ). However I believe we can remove the extra case for PPC if we use
> MAP_FIXED when doing the secondary process mappings because we need them to
> be identical anyway. We could also use MAP_FIXED when doing the primary
> process mappings resp. get_virtual_area if we want to have any guarantees
> when specifying a base address. Any thoughts?
> 
> Thanks,
> Jonas
> 
hi Jonas,

MAP_FIXED is not used because it's dangerous, it unmaps anything that is 
already mapped into that space. We would rather know that we can't map 
something than unwittingly unmap something that was mapped before.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 14:06 ` Burakov, Anatoly
@ 2017-10-27 14:28   ` Jonas Pfefferle1
  2017-10-27 14:44     ` Burakov, Anatoly
  0 siblings, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-27 14:28 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: bruce.richardson, chaozhu, dev

"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 04:06:44
PM:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> Date: 10/27/2017 04:06 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >
> >
> > Hi @all,
> >
> > I'm trying to make sense of the hugepage memory mappings in
> > librte_eal/linuxapp/eal/eal_memory.c:
> > * In rte_eal_hugepage_attach (line 1347) when we try to do a private
> > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we need
the
> > addresses to be identical with the primary process?
> > * On POWER we have this weird business going on where we use
MAP_HUGETLB
> > because according to this commit:
> >
> > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > Date:   Thu Apr 6 15:36:09 2017 +0530
> >
> >      eal/ppc: fix mmap for memory initialization
> >
> >      On IBM POWER platform, when mapping /dev/zero file to hugepage
memory
> >      space, mmap will not respect the requested address hint. This will
> > cause
> >      the memory initialization for the second process fails. This patch
adds
> >      the required mmap flags to make it work. Beside this, users need
to set
> >      the nr_overcommit_hugepages to expand the VA range. When
> >      doing the initialization, users need to set both nr_hugepages and
> >      nr_overcommit_hugepages to the same value, like 64, 128, etc.
> >
> > mmap address hints are not respected. Looking at the mmap code in the
> > kernel this is not true entirely however under some circumstances the
hint
> > can be ignored (
> > https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > ). However I believe we can remove the extra case for PPC if we use
> > MAP_FIXED when doing the secondary process mappings because we need
them to
> > be identical anyway. We could also use MAP_FIXED when doing the primary
> > process mappings resp. get_virtual_area if we want to have any
guarantees
> > when specifying a base address. Any thoughts?
> >
> > Thanks,
> > Jonas
> >
> hi Jonas,
>
> MAP_FIXED is not used because it's dangerous, it unmaps anything that is
> already mapped into that space. We would rather know that we can't map
> something than unwittingly unmap something that was mapped before.

Ok, I see. Maybe we can add a check to the primary process's memory
mappings whether the hint has been respected or not? At least warn if it
hasn't.

>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 14:28   ` Jonas Pfefferle1
@ 2017-10-27 14:44     ` Burakov, Anatoly
  2017-10-27 14:58       ` Jonas Pfefferle1
  2017-10-27 15:48       ` Tan, Jianfeng
  0 siblings, 2 replies; 14+ messages in thread
From: Burakov, Anatoly @ 2017-10-27 14:44 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: bruce.richardson, chaozhu, dev

On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
> 04:06:44 PM:
> 
>  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
>  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
>  > Date: 10/27/2017 04:06 PM
>  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>  >
>  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
>  > >
>  > >
>  > > Hi @all,
>  > >
>  > > I'm trying to make sense of the hugepage memory mappings in
>  > > librte_eal/linuxapp/eal/eal_memory.c:
>  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a private
>  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we 
> need the
>  > > addresses to be identical with the primary process?
>  > > * On POWER we have this weird business going on where we use 
> MAP_HUGETLB
>  > > because according to this commit:
>  > >
>  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
>  > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
>  > > Date:   Thu Apr 6 15:36:09 2017 +0530
>  > >
>  > >      eal/ppc: fix mmap for memory initialization
>  > >
>  > >      On IBM POWER platform, when mapping /dev/zero file to hugepage 
> memory
>  > >      space, mmap will not respect the requested address hint. This will
>  > > cause
>  > >      the memory initialization for the second process fails. This 
> patch adds
>  > >      the required mmap flags to make it work. Beside this, users 
> need to set
>  > >      the nr_overcommit_hugepages to expand the VA range. When
>  > >      doing the initialization, users need to set both nr_hugepages and
>  > >      nr_overcommit_hugepages to the same value, like 64, 128, etc.
>  > >
>  > > mmap address hints are not respected. Looking at the mmap code in the
>  > > kernel this is not true entirely however under some circumstances 
> the hint
>  > > can be ignored (
>  > > https://urldefense.proofpoint.com/v2/url?
>  > 
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
>  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
>  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
>  > BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
>  > > ). However I believe we can remove the extra case for PPC if we use
>  > > MAP_FIXED when doing the secondary process mappings because we need 
> them to
>  > > be identical anyway. We could also use MAP_FIXED when doing the primary
>  > > process mappings resp. get_virtual_area if we want to have any 
> guarantees
>  > > when specifying a base address. Any thoughts?
>  > >
>  > > Thanks,
>  > > Jonas
>  > >
>  > hi Jonas,
>  >
>  > MAP_FIXED is not used because it's dangerous, it unmaps anything that is
>  > already mapped into that space. We would rather know that we can't map
>  > something than unwittingly unmap something that was mapped before.
> 
> Ok, I see. Maybe we can add a check to the primary process's memory 
> mappings whether the hint has been respected or not? At least warn if it 
> hasn't.

Hi Jonas,

I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain 
a bit more what you mean by "hint has been respected" :)


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 14:44     ` Burakov, Anatoly
@ 2017-10-27 14:58       ` Jonas Pfefferle1
  2017-10-27 15:16         ` Jonas Pfefferle1
  2017-10-27 15:48       ` Tan, Jianfeng
  1 sibling, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-27 14:58 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: bruce.richardson, chaozhu, dev


"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 04:44:52
PM:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
> Date: 10/27/2017 04:45 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017
> > 04:06:44 PM:
> >
> >  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> >  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> >  > Date: 10/27/2017 04:06 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  >
> >  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > >
> >  > >
> >  > > Hi @all,
> >  > >
> >  > > I'm trying to make sense of the hugepage memory mappings in
> >  > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
private
> >  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if we

> > need the
> >  > > addresses to be identical with the primary process?
> >  > > * On POWER we have this weird business going on where we use
> > MAP_HUGETLB
> >  > > because according to this commit:
> >  > >
> >  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> >  > > Date:   Thu Apr 6 15:36:09 2017 +0530
> >  > >
> >  > >      eal/ppc: fix mmap for memory initialization
> >  > >
> >  > >      On IBM POWER platform, when mapping /dev/zero file to
hugepage
> > memory
> >  > >      space, mmap will not respect the requested address hint.This
will
> >  > > cause
> >  > >      the memory initialization for the second process fails. This
> > patch adds
> >  > >      the required mmap flags to make it work. Beside this, users
> > need to set
> >  > >      the nr_overcommit_hugepages to expand the VA range. When
> >  > >      doing the initialization, users need to set both nr_hugepages
and
> >  > >      nr_overcommit_hugepages to the same value, like 64, 128, etc.
> >  > >
> >  > > mmap address hints are not respected. Looking at the mmap code in
the
> >  > > kernel this is not true entirely however under some circumstances
> > the hint
> >  > > can be ignored (
> >  > > https://urldefense.proofpoint.com/v2/url?
> >  >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> >  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> >  >
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> >  > > ). However I believe we can remove the extra case for PPC if we
use
> >  > > MAP_FIXED when doing the secondary process mappings because we
need
> > them to
> >  > > be identical anyway. We could also use MAP_FIXED when doing the
primary
> >  > > process mappings resp. get_virtual_area if we want to have any
> > guarantees
> >  > > when specifying a base address. Any thoughts?
> >  > >
> >  > > Thanks,
> >  > > Jonas
> >  > >
> >  > hi Jonas,
> >  >
> >  > MAP_FIXED is not used because it's dangerous, it unmaps anything
that is
> >  > already mapped into that space. We would rather know that we can't
map
> >  > something than unwittingly unmap something that was mapped before.
> >
> > Ok, I see. Maybe we can add a check to the primary process's memory
> > mappings whether the hint has been respected or not? At least warn if
it
> > hasn't.
>
> Hi Jonas,
>
> I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
> a bit more what you mean by "hint has been respected" :)

Hi Anatoly,

What I meant was the mmap address hint:

"If addr is not NULL, then the kernel takes it as a hint
 about where to place the mapping; on Linux, the mapping will be
 created at a nearby page boundary."

This is actually not true on POWER. It can happen that the address hint is
ignored and you get any address back that fits your mapping.

Thanks,
Jonas

>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 14:58       ` Jonas Pfefferle1
@ 2017-10-27 15:16         ` Jonas Pfefferle1
  2017-10-27 16:00           ` Burakov, Anatoly
  0 siblings, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-27 15:16 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: Burakov, Anatoly, bruce.richardson, chaozhu, dev


"dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:

> From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
> Date: 10/27/2017 04:58 PM
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> Sent by: "dev" <dev-bounces@dpdk.org>
>
>
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017
04:44:52
> PM:
>
> > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
dev@dpdk.org
> > Date: 10/27/2017 04:45 PM
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017
> > > 04:06:44 PM:
> > >
> > >  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > >  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> > >  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> > >  > Date: 10/27/2017 04:06 PM
> > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  >
> > >  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > >  > >
> > >  > >
> > >  > > Hi @all,
> > >  > >
> > >  > > I'm trying to make sense of the hugepage memory mappings in
> > >  > > librte_eal/linuxapp/eal/eal_memory.c:
> > >  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
> private
> > >  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED if
we
>
> > > need the
> > >  > > addresses to be identical with the primary process?
> > >  > > * On POWER we have this weird business going on where we use
> > > MAP_HUGETLB
> > >  > > because according to this commit:
> > >  > >
> > >  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > >  > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > >  > > Date:   Thu Apr 6 15:36:09 2017 +0530
> > >  > >
> > >  > >      eal/ppc: fix mmap for memory initialization
> > >  > >
> > >  > >      On IBM POWER platform, when mapping /dev/zero file to
> hugepage
> > > memory
> > >  > >      space, mmap will not respect the requested address
hint.This
> will
> > >  > > cause
> > >  > >      the memory initialization for the second process fails.
This
> > > patch adds
> > >  > >      the required mmap flags to make it work. Beside this, users
> > > need to set
> > >  > >      the nr_overcommit_hugepages to expand the VA range. When
> > >  > >      doing the initialization, users need to set both
nr_hugepages
> and
> > >  > >      nr_overcommit_hugepages to the same value, like 64, 128,
etc.
> > >  > >
> > >  > > mmap address hints are not respected. Looking at the mmap code
in
> the
> > >  > > kernel this is not true entirely however under some
circumstances
> > > the hint
> > >  > > can be ignored (
> > >  > > https://urldefense.proofpoint.com/v2/url?
> > >  >
> > >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

>
> > >  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > >  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> > >  >
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > >  > > ). However I believe we can remove the extra case for PPC if we
> use
> > >  > > MAP_FIXED when doing the secondary process mappings because we
> need
> > > them to
> > >  > > be identical anyway. We could also use MAP_FIXED when doing the
> primary
> > >  > > process mappings resp. get_virtual_area if we want to have any
> > > guarantees
> > >  > > when specifying a base address. Any thoughts?
> > >  > >
> > >  > > Thanks,
> > >  > > Jonas
> > >  > >
> > >  > hi Jonas,
> > >  >
> > >  > MAP_FIXED is not used because it's dangerous, it unmaps anything
> that is
> > >  > already mapped into that space. We would rather know that we can't
> map
> > >  > something than unwittingly unmap something that was mapped before.
> > >
> > > Ok, I see. Maybe we can add a check to the primary process's memory
> > > mappings whether the hint has been respected or not? At least warn if
> it
> > > hasn't.
> >
> > Hi Jonas,
> >
> > I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
> > a bit more what you mean by "hint has been respected" :)
>
> Hi Anatoly,
>
> What I meant was the mmap address hint:
>
> "If addr is not NULL, then the kernel takes it as a hint
>  about where to place the mapping; on Linux, the mapping will be
>  created at a nearby page boundary."
>
> This is actually not true on POWER. It can happen that the address hint
is
> ignored and you get any address back that fits your mapping.
>
> Thanks,
> Jonas

Actually looking through the kernel code this is also not guaranteed on
x86.
(
http://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165
)

So in any case the address hint can be ignored by the kernel and you get
any address that fits your mapping.
My suggestion is to check when we do the initial mapping in
get_virtual_area if the hint was respected or not, i.e. if the returned
address == PAGE_ALIGN(address_hint).

Thanks,
Jonas

>
> >
> >
> > --
> > Thanks,
> > Anatoly
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 14:44     ` Burakov, Anatoly
  2017-10-27 14:58       ` Jonas Pfefferle1
@ 2017-10-27 15:48       ` Tan, Jianfeng
  2017-10-27 16:06         ` Burakov, Anatoly
  1 sibling, 1 reply; 14+ messages in thread
From: Tan, Jianfeng @ 2017-10-27 15:48 UTC (permalink / raw)
  To: Burakov, Anatoly, Jonas Pfefferle1; +Cc: bruce.richardson, chaozhu, dev



On 10/27/2017 10:44 PM, Burakov, Anatoly wrote:
> On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
>> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
>> 04:06:44 PM:
>>
>>  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>>  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
>>  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
>>  > Date: 10/27/2017 04:06 PM
>>  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>>  ...
>>  > >
>>  > hi Jonas,
>>  >
>>  > MAP_FIXED is not used because it's dangerous, it unmaps anything 
>> that is
>>  > already mapped into that space. We would rather know that we can't 
>> map
>>  > something than unwittingly unmap something that was mapped before.
>>
>> Ok, I see. Maybe we can add a check to the primary process's memory 
>> mappings whether the hint has been respected or not? At least warn if 
>> it hasn't.
>
> Hi Jonas,
>
> I'm unfamiliar with POWER platform, so i'm afraid you'd have to 
> explain a bit more what you mean by "hint has been respected" :)

Actually, I also met this case on x86 once that kernel does not respect 
the "addr" parameter even that memory region is not occupied. I am not 
sure if it can be reproduced now, anyway, send here FYI: we run primary 
on the host, run secondary in a container.

I'll agree at least we need to check if the final addr is the same of 
the parameter addr, and warn if it's not.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 15:16         ` Jonas Pfefferle1
@ 2017-10-27 16:00           ` Burakov, Anatoly
  2017-10-27 19:22             ` Jonas Pfefferle1
  0 siblings, 1 reply; 14+ messages in thread
From: Burakov, Anatoly @ 2017-10-27 16:00 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: bruce.richardson, chaozhu, dev

On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> "dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> 
>  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
>  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>  > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
>  > Date: 10/27/2017 04:58 PM
>  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>  > Sent by: "dev" <dev-bounces@dpdk.org>
>  >
>  >
>  > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
> 04:44:52
>  > PM:
>  >
>  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
>  > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, 
> dev@dpdk.org
>  > > Date: 10/27/2017 04:45 PM
>  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>  > >
>  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
>  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017
>  > > > 04:06:44 PM:
>  > > >
>  > > >  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>  > > >  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
>  > > >  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
>  > > >  > Date: 10/27/2017 04:06 PM
>  > > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>  > > >  >
>  > > >  > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
>  > > >  > >
>  > > >  > >
>  > > >  > > Hi @all,
>  > > >  > >
>  > > >  > > I'm trying to make sense of the hugepage memory mappings in
>  > > >  > > librte_eal/linuxapp/eal/eal_memory.c:
>  > > >  > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
>  > private
>  > > >  > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED 
> if we
>  >
>  > > > need the
>  > > >  > > addresses to be identical with the primary process?
>  > > >  > > * On POWER we have this weird business going on where we use
>  > > > MAP_HUGETLB
>  > > >  > > because according to this commit:
>  > > >  > >
>  > > >  > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
>  > > >  > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
>  > > >  > > Date:   Thu Apr 6 15:36:09 2017 +0530
>  > > >  > >
>  > > >  > >      eal/ppc: fix mmap for memory initialization
>  > > >  > >
>  > > >  > >      On IBM POWER platform, when mapping /dev/zero file to
>  > hugepage
>  > > > memory
>  > > >  > >      space, mmap will not respect the requested address 
> hint.This
>  > will
>  > > >  > > cause
>  > > >  > >      the memory initialization for the second process fails. 
> This
>  > > > patch adds
>  > > >  > >      the required mmap flags to make it work. Beside this, users
>  > > > need to set
>  > > >  > >      the nr_overcommit_hugepages to expand the VA range. When
>  > > >  > >      doing the initialization, users need to set both 
> nr_hugepages
>  > and
>  > > >  > >      nr_overcommit_hugepages to the same value, like 64, 
> 128, etc.
>  > > >  > >
>  > > >  > > mmap address hints are not respected. Looking at the mmap 
> code in
>  > the
>  > > >  > > kernel this is not true entirely however under some 
> circumstances
>  > > > the hint
>  > > >  > > can be ignored (
>  > > >  > > https://urldefense.proofpoint.com/v2/url?
>  > > >  >
>  > > >
>  > >
>  > 
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
>  >
>  > > >  > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
>  > > >  > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
>  > > >  >
>  > BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
>  > > >  > > ). However I believe we can remove the extra case for PPC if we
>  > use
>  > > >  > > MAP_FIXED when doing the secondary process mappings because we
>  > need
>  > > > them to
>  > > >  > > be identical anyway. We could also use MAP_FIXED when doing the
>  > primary
>  > > >  > > process mappings resp. get_virtual_area if we want to have any
>  > > > guarantees
>  > > >  > > when specifying a base address. Any thoughts?
>  > > >  > >
>  > > >  > > Thanks,
>  > > >  > > Jonas
>  > > >  > >
>  > > >  > hi Jonas,
>  > > >  >
>  > > >  > MAP_FIXED is not used because it's dangerous, it unmaps anything
>  > that is
>  > > >  > already mapped into that space. We would rather know that we can't
>  > map
>  > > >  > something than unwittingly unmap something that was mapped before.
>  > > >
>  > > > Ok, I see. Maybe we can add a check to the primary process's memory
>  > > > mappings whether the hint has been respected or not? At least warn if
>  > it
>  > > > hasn't.
>  > >
>  > > Hi Jonas,
>  > >
>  > > I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
>  > > a bit more what you mean by "hint has been respected" :)
>  >
>  > Hi Anatoly,
>  >
>  > What I meant was the mmap address hint:
>  >
>  > "If addr is not NULL, then the kernel takes it as a hint
>  >  about where to place the mapping; on Linux, the mapping will be
>  >  created at a nearby page boundary."
>  >
>  > This is actually not true on POWER. It can happen that the address 
> hint is
>  > ignored and you get any address back that fits your mapping.
>  >
>  > Thanks,
>  > Jonas
> 
> Actually looking through the kernel code this is also not guaranteed on x86.
> (http://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165)
> 
> So in any case the address hint can be ignored by the kernel and you get 
> any address that fits your mapping.
> My suggestion is to check when we do the initial mapping in 
> get_virtual_area if the hint was respected or not, i.e. if the returned 
> address == PAGE_ALIGN(address_hint).
> 

I'm not sure i see the issue here. So, just to make sure i understand 
things correctly:

Whenever we don't request a specific base address through base_address 
EAL parameter, none of this matters - we always ask for memory in 
arbitrary memory locations, correct?

It's also not an issue with secondary processes because we do check 
returned mmap address to see whether it's the same as we requested, correct?

It's only whenever we *do* specify a base_address, we provide an address 
hint to mmap to, but we don't check if the address we got from mmap is 
one in the vicinity of our requested base address, correct? We don't 
check, and the kernel can ignore address hint, so we're not guaranteed 
to respect the base_address flag.

I'm not sure this is a serious issue, because as far as i'm concerned, 
this flag is advisory - we only promise to *attempt* to map things at 
that particular address, not that it will succeed. If the kernel simply 
cannot find an address to satisfy our address hint, or ignores it for 
other reasons - well, tough, nothing we can do about that. I'm not sure 
putting a check like this, where we can't even predict an "expected" 
address is a good idea.

Am i getting this right?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 15:48       ` Tan, Jianfeng
@ 2017-10-27 16:06         ` Burakov, Anatoly
  0 siblings, 0 replies; 14+ messages in thread
From: Burakov, Anatoly @ 2017-10-27 16:06 UTC (permalink / raw)
  To: Tan, Jianfeng, Jonas Pfefferle1; +Cc: bruce.richardson, chaozhu, dev

On 27-Oct-17 4:48 PM, Tan, Jianfeng wrote:
> 
> 
> On 10/27/2017 10:44 PM, Burakov, Anatoly wrote:
>> On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
>>> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
>>> 04:06:44 PM:
>>>
>>>  > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>>>  > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
>>>  > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
>>>  > Date: 10/27/2017 04:06 PM
>>>  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>>>  ...
>>>  > >
>>>  > hi Jonas,
>>>  >
>>>  > MAP_FIXED is not used because it's dangerous, it unmaps anything 
>>> that is
>>>  > already mapped into that space. We would rather know that we can't 
>>> map
>>>  > something than unwittingly unmap something that was mapped before.
>>>
>>> Ok, I see. Maybe we can add a check to the primary process's memory 
>>> mappings whether the hint has been respected or not? At least warn if 
>>> it hasn't.
>>
>> Hi Jonas,
>>
>> I'm unfamiliar with POWER platform, so i'm afraid you'd have to 
>> explain a bit more what you mean by "hint has been respected" :)
> 
> Actually, I also met this case on x86 once that kernel does not respect 
> the "addr" parameter even that memory region is not occupied. I am not 
> sure if it can be reproduced now, anyway, send here FYI: we run primary 
> on the host, run secondary in a container.
> 
> I'll agree at least we need to check if the final addr is the same of 
> the parameter addr, and warn if it's not.
> 
> Thanks,
> Jianfeng
> 

We could put in a warning saying that the address we got is *lower* than 
the address we expected to get, but i'm not sure throwing a warning 
because our assumption about kernel's behavior was incorrect is worth it.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 16:00           ` Burakov, Anatoly
@ 2017-10-27 19:22             ` Jonas Pfefferle1
  2017-11-07  8:25               ` Chao Zhu
  0 siblings, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-10-27 19:22 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: bruce.richardson, chaozhu, dev

"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 27/10/2017 
18:00:27:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org
> Date: 27/10/2017 18:00
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> 
> On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > "dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> > 
> >  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> >  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, 
dev@dpdk.org
> >  > Date: 10/27/2017 04:58 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > Sent by: "dev" <dev-bounces@dpdk.org>
> >  >
> >  >
> >  > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017 
> > 04:44:52
> >  > PM:
> >  >
> >  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> >  > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, 
> > dev@dpdk.org
> >  > > Date: 10/27/2017 04:45 PM
> >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > >
> >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> >  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 
10/27/2017
> >  > > > 04:06:44 PM:
> >  > > >
> >  > > > Â > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> >  > > > Â > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> >  > > > Â > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> >  > > > Â > Date: 10/27/2017 04:06 PM
> >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process 
linux
> >  > > > Â >
> >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > > > Â > >
> >  > > > Â > >
> >  > > > Â > > Hi @all,
> >  > > > Â > >
> >  > > > Â > > I'm trying to make sense of the hugepage memory mappings 
in
> >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to 
do a
> >  > private
> >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use 
MAP_FIXED 
> > if we
> >  >
> >  > > > need the
> >  > > > Â > > addresses to be identical with the primary process?
> >  > > > Â > > * On POWER we have this weird business going on where we 
use
> >  > > > MAP_HUGETLB
> >  > > > Â > > because according to this commit:
> >  > > > Â > >
> >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > > Â > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero 
file to
> >  > hugepage
> >  > > > memory
> >  > > > Â > > Â  Â  Â space, mmap will not respect the requested 
address 
> > hint.This
> >  > will
> >  > > > Â > > cause
> >  > > > Â > > Â  Â  Â the memory initialization for the second 
> process fails. 
> > This
> >  > > > patch adds
> >  > > > Â > > Â  Â  Â the required mmap flags to make it work. 
> Beside this, users
> >  > > > need to set
> >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA 
> range. When
> >  > > > Â > > Â  Â  Â doing the initialization, users need to set both 
> > nr_hugepages
> >  > and
> >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like 
64, 
> > 128, etc.
> >  > > > Â > >
> >  > > > Â > > mmap address hints are not respected. Looking at the mmap 

> > code in
> >  > the
> >  > > > Â > > kernel this is not true entirely however under some 
> > circumstances
> >  > > > the hint
> >  > > > Â > > can be ignored (
> >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> >  > > > Â >
> >  > > >
> >  > >
> >  > 
> > 
> 
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
> >  >
> >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> >  > > > Â >
> >  > 
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> >  > > > Â > > ). However I believe we can remove the extra case forPPC 
if we
> >  > use
> >  > > > Â > > MAP_FIXED when doing the secondary process mappings 
because we
> >  > need
> >  > > > them to
> >  > > > Â > > be identical anyway. We could also use MAP_FIXED 
whendoing the
> >  > primary
> >  > > > Â > > process mappings resp. get_virtual_area if we want to 
have any
> >  > > > guarantees
> >  > > > Â > > when specifying a base address. Any thoughts?
> >  > > > Â > >
> >  > > > Â > > Thanks,
> >  > > > Â > > Jonas
> >  > > > Â > >
> >  > > > Â > hi Jonas,
> >  > > > Â >
> >  > > > Â > MAP_FIXED is not used because it's dangerous, it unmaps 
anything
> >  > that is
> >  > > > Â > already mapped into that space. We would rather know 
> that we can't
> >  > map
> >  > > > Â > something than unwittingly unmap something that was 
> mapped before.
> >  > > >
> >  > > > Ok, I see. Maybe we can add a check to the primary process's 
memory
> >  > > > mappings whether the hint has been respected or not? At least 
warn if
> >  > it
> >  > > > hasn't.
> >  > >
> >  > > Hi Jonas,
> >  > >
> >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd have to 
explain
> >  > > a bit more what you mean by "hint has been respected" :)
> >  >
> >  > Hi Anatoly,
> >  >
> >  > What I meant was the mmap address hint:
> >  >
> >  > "If addr is not NULL, then the kernel takes it as a hint
> >  > Â about where to place the mapping; on Linux, the mapping will be
> >  > Â created at a nearby page boundary."
> >  >
> >  > This is actually not true on POWER. It can happen that the address 
> > hint is
> >  > ignored and you get any address back that fits your mapping.
> >  >
> >  > Thanks,
> >  > Jonas
> > 
> > Actually looking through the kernel code this is also not guaranteed 
on x86.
> > (https://urldefense.proofpoint.com/v2/url?
> 
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-
> g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> 
pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-
> jsnNP-hAcW487Mumv6xPw&e=)
> > 
> > So in any case the address hint can be ignored by the kernel and you 
get 
> > any address that fits your mapping.
> > My suggestion is to check when we do the initial mapping in 
> > get_virtual_area if the hint was respected or not, i.e. if the 
returned 
> > address == PAGE_ALIGN(address_hint).
> > 
> 
> I'm not sure i see the issue here. So, just to make sure i understand 
> things correctly:
> 
> Whenever we don't request a specific base address through base_address 
> EAL parameter, none of this matters - we always ask for memory in 
> arbitrary memory locations, correct?
> 
> It's also not an issue with secondary processes because we do check 
> returned mmap address to see whether it's the same as we requested, 
correct?
> 
> It's only whenever we *do* specify a base_address, we provide an address 

> hint to mmap to, but we don't check if the address we got from mmap is 
> one in the vicinity of our requested base address, correct? We don't 
> check, and the kernel can ignore address hint, so we're not guaranteed 
> to respect the base_address flag.
> 
> I'm not sure this is a serious issue, because as far as i'm concerned, 
> this flag is advisory - we only promise to *attempt* to map things at 
> that particular address, not that it will succeed. If the kernel simply 
> cannot find an address to satisfy our address hint, or ignores it for 
> other reasons - well, tough, nothing we can do about that. I'm not sure 
> putting a check like this, where we can't even predict an "expected" 
> address is a good idea.
> 
> Am i getting this right?

The problem is when we specify a base address we want it to be used. If it 
is
not respected we basically end up with the case like we would have never 
specified it.
This very likely leads to not being able to run a secondary process 
because
we will not be able to map the addresses from our primary process and that 
is why we
introduced the base address parameter in the first place.

> 
> -- 
> Thanks,
> Anatoly
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-10-27 19:22             ` Jonas Pfefferle1
@ 2017-11-07  8:25               ` Chao Zhu
  2017-11-07 10:15                 ` Jonas Pfefferle1
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Zhu @ 2017-11-07  8:25 UTC (permalink / raw)
  To: 'Jonas Pfefferle1', 'Burakov, Anatoly'
  Cc: bruce.richardson, dev

 

 

From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] 
Sent: 2017年10月28日 3:23
To: Burakov, Anatoly <anatoly.burakov@intel.com>
Cc: bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; dev@dpdk.org
Subject: Re: [dpdk-dev] Huge mapping secondary process linux

 

"Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com> wrote on 27/10/2017 18:00:27:

> From: "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 < <mailto:JPF@zurich.ibm.com> JPF@zurich.ibm.com>
> Cc:  <mailto:bruce.richardson@intel.com> bruce.richardson@intel.com,  <mailto:chaozhu@linux.vnet.ibm.com> chaozhu@linux.vnet.ibm.com,  <mailto:dev@dpdk.org> dev@dpdk.org
> Date: 27/10/2017 18:00
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> 
> On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > "dev" < <mailto:dev-bounces@dpdk.org> dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> > 
> >  > From: "Jonas Pfefferle1" < <mailto:JPF@zurich.ibm.com> JPF@zurich.ibm.com>
> >  > To: "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com>
> >  > Cc:  <mailto:bruce.richardson@intel.com> bruce.richardson@intel.com,  <mailto:chaozhu@linux.vnet.ibm.com> chaozhu@linux.vnet.ibm.com,  <mailto:dev@dpdk.org> dev@dpdk.org
> >  > Date: 10/27/2017 04:58 PM
> >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > Sent by: "dev" < <mailto:dev-bounces@dpdk.org> dev-bounces@dpdk.org>
> >  >
> >  >
> >  > "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com> wrote on 10/27/2017 
> > 04:44:52
> >  > PM:
> >  >
> >  > > From: "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com>
> >  > > To: Jonas Pfefferle1 < <mailto:JPF@zurich.ibm.com> JPF@zurich.ibm.com>
> >  > > Cc:  <mailto:bruce.richardson@intel.com> bruce.richardson@intel.com,  <mailto:chaozhu@linux.vnet.ibm.com> chaozhu@linux.vnet.ibm.com, 
> >  <mailto:dev@dpdk.org> dev@dpdk.org
> >  > > Date: 10/27/2017 04:45 PM
> >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > >
> >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> >  > > > "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com> wrote on 10/27/2017
> >  > > > 04:06:44 PM:
> >  > > >
> >  > > > Â > From: "Burakov, Anatoly" < <mailto:anatoly.burakov@intel.com> anatoly.burakov@intel.com>
> >  > > > Â > To: Jonas Pfefferle1 < <mailto:JPF@zurich.ibm.com> JPF@zurich.ibm.com>,  <mailto:dev@dpdk.org> dev@dpdk.org
> >  > > > Â > Cc:  <mailto:chaozhu@linux.vnet.ibm.com> chaozhu@linux.vnet.ibm.com,  <mailto:bruce.richardson@intel.com> bruce.richardson@intel.com
> >  > > > Â > Date: 10/27/2017 04:06 PM
> >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >  > > > Â >
> >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> >  > > > Â > >
> >  > > > Â > >
> >  > > > Â > > Hi @all,
> >  > > > Â > >
> >  > > > Â > > I'm trying to make sense of the hugepage memory mappings in
> >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
> >  > private
> >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED 
> > if we
> >  >
> >  > > > need the
> >  > > > Â > > addresses to be identical with the primary process?
> >  > > > Â > > * On POWER we have this weird business going on where we use
> >  > > > MAP_HUGETLB
> >  > > > Â > > because according to this commit:
> >  > > > Â > >
> >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> >  > > > Â > > Author: Chao Zhu < <mailto:chaozhu@linux.vnet.ibm.com> chaozhu@linux.vnet.ibm.com>
> >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> >  > > > Â > >
> >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero file to
> >  > hugepage
> >  > > > memory
> >  > > > Â > > Â  Â  Â space, mmap will not respect the requested address 
> > hint.This
> >  > will
> >  > > > Â > > cause
> >  > > > Â > > Â  Â  Â the memory initialization for the second 
> process fails. 
> > This
> >  > > > patch adds
> >  > > > Â > > Â  Â  Â the required mmap flags to make it work. 
> Beside this, users
> >  > > > need to set
> >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA 
> range. When
> >  > > > Â > > Â  Â  Â doing the initialization, users need to set both 
> > nr_hugepages
> >  > and
> >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like 64, 
> > 128, etc.
> >  > > > Â > >
> >  > > > Â > > mmap address hints are not respected. Looking at the mmap 
> > code in
> >  > the
> >  > > > Â > > kernel this is not true entirely however under some 
> > circumstances
> >  > > > the hint
> >  > > > Â > > can be ignored (
> >  > > > Â > >  <https://urldefense.proofpoint.com/v2/url?> https://urldefense.proofpoint.com/v2/url?
> >  > > > Â >
> >  > > >
> >  > >
> >  > 
> > 
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
> >  >
> >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> >  > > > Â >
> >  > BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> >  > > > Â > > ). However I believe we can remove the extra case forPPC if we
> >  > use
> >  > > > Â > > MAP_FIXED when doing the secondary process mappings because we
> >  > need
> >  > > > them to
> >  > > > Â > > be identical anyway. We could also use MAP_FIXED whendoing the
> >  > primary
> >  > > > Â > > process mappings resp. get_virtual_area if we want to have any
> >  > > > guarantees
> >  > > > Â > > when specifying a base address. Any thoughts?
> >  > > > Â > >
> >  > > > Â > > Thanks,
> >  > > > Â > > Jonas
> >  > > > Â > >
> >  > > > Â > hi Jonas,
> >  > > > Â >
> >  > > > Â > MAP_FIXED is not used because it's dangerous, it unmaps anything
> >  > that is
> >  > > > Â > already mapped into that space. We would rather know 
> that we can't
> >  > map
> >  > > > Â > something than unwittingly unmap something that was 
> mapped before.
> >  > > >
> >  > > > Ok, I see. Maybe we can add a check to the primary process's memory
> >  > > > mappings whether the hint has been respected or not? At least warn if
> >  > it
> >  > > > hasn't.
> >  > >
> >  > > Hi Jonas,
> >  > >
> >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd have to explain
> >  > > a bit more what you mean by "hint has been respected" :)
> >  >
> >  > Hi Anatoly,
> >  >
> >  > What I meant was the mmap address hint:
> >  >
> >  > "If addr is not NULL, then the kernel takes it as a hint
> >  > Â about where to place the mapping; on Linux, the mapping will be
> >  > Â created at a nearby page boundary."
> >  >
> >  > This is actually not true on POWER. It can happen that the address 
> > hint is
> >  > ignored and you get any address back that fits your mapping.
> >  >
> >  > Thanks,
> >  > Jonas
> > 
> > Actually looking through the kernel code this is also not guaranteed on x86.
> > ( <https://urldefense.proofpoint.com/v2/url?> https://urldefense.proofpoint.com/v2/url?
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-
> g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-
> jsnNP-hAcW487Mumv6xPw&e=)
> > 
> > So in any case the address hint can be ignored by the kernel and you get 
> > any address that fits your mapping.
> > My suggestion is to check when we do the initial mapping in 
> > get_virtual_area if the hint was respected or not, i.e. if the returned 
> > address == PAGE_ALIGN(address_hint).
> > 
> 
> I'm not sure i see the issue here. So, just to make sure i understand 
> things correctly:
> 
> Whenever we don't request a specific base address through base_address 
> EAL parameter, none of this matters - we always ask for memory in 
> arbitrary memory locations, correct?
> 
> It's also not an issue with secondary processes because we do check 
> returned mmap address to see whether it's the same as we requested, correct?
> 
> It's only whenever we *do* specify a base_address, we provide an address 
> hint to mmap to, but we don't check if the address we got from mmap is 
> one in the vicinity of our requested base address, correct? We don't 
> check, and the kernel can ignore address hint, so we're not guaranteed 
> to respect the base_address flag.
> 
> I'm not sure this is a serious issue, because as far as i'm concerned, 
> this flag is advisory - we only promise to *attempt* to map things at 
> that particular address, not that it will succeed. If the kernel simply 
> cannot find an address to satisfy our address hint, or ignores it for 
> other reasons - well, tough, nothing we can do about that. I'm not sure 
> putting a check like this, where we can't even predict an "expected" 
> address is a good idea.
> 
> Am i getting this right?

The problem is when we specify a base address we want it to be used. If it is
not respected we basically end up with the case like we would have never specified it.
This very likely leads to not being able to run a secondary process because
we will not be able to map the addresses from our primary process and that is why we
introduced the base address parameter in the first place.

> 
> -- 
> Thanks,
> Anatoly
> 

The reason why I put the patch there is that when mapping hugepage on POWER, the kernel will never respect the address hints when doing mmap unless we expand the address space or unmap all the hugepages.  This is a big difference when compared with x86. And it affects the mapping of  the secondary process. I agree that the hints is advisory. Just want to see if there are better solutions.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-11-07  8:25               ` Chao Zhu
@ 2017-11-07 10:15                 ` Jonas Pfefferle1
  2017-11-09  3:08                   ` Chao Zhu
  0 siblings, 1 reply; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-07 10:15 UTC (permalink / raw)
  To: Chao Zhu; +Cc: 'Burakov, Anatoly', bruce.richardson, dev

"Chao Zhu" <chaozhu@linux.vnet.ibm.com> wrote on 11/07/2017 09:25:26 AM:

> From: "Chao Zhu" <chaozhu@linux.vnet.ibm.com>
> To: "'Jonas Pfefferle1'" <JPF@zurich.ibm.com>, "'Burakov, Anatoly'"
> <anatoly.burakov@intel.com>
> Cc: <bruce.richardson@intel.com>, <dev@dpdk.org>
> Date: 11/07/2017 11:00 AM
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
>
>
> From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com]
> Sent: 2017年10月28日 3:23
> To: Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 27/10/2017
18:00:27:
>
> > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
dev@dpdk.org
> > Date: 27/10/2017 18:00
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > > "dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> > >
> > >  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> > >  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > >  > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
> dev@dpdk.org
> > >  > Date: 10/27/2017 04:58 PM
> > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > Sent by: "dev" <dev-bounces@dpdk.org>
> > >  >
> > >  >
> > >  > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 10/27/2017

> > > 04:44:52
> > >  > PM:
> > >  >
> > >  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > >  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > >  > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
> > > dev@dpdk.org
> > >  > > Date: 10/27/2017 04:45 PM
> > >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > >
> > >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > >  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on
10/27/2017
> > >  > > > 04:06:44 PM:
> > >  > > >
> > >  > > > Â > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > >  > > > Â > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> > >  > > > Â > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com
> > >  > > > Â > Date: 10/27/2017 04:06 PM
> > >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process
linux
> > >  > > > Â >
> > >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > >  > > > Â > >
> > >  > > > Â > >
> > >  > > > Â > > Hi @all,
> > >  > > > Â > >
> > >  > > > Â > > I'm trying to make sense of the hugepage memory mappings
in
> > >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> > >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to
do a
> > >  > private
> > >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use
MAP_FIXED
> > > if we
> > >  >
> > >  > > > need the
> > >  > > > Â > > addresses to be identical with the primary process?
> > >  > > > Â > > * On POWER we have this weird business going on where we
use
> > >  > > > MAP_HUGETLB
> > >  > > > Â > > because according to this commit:
> > >  > > > Â > >
> > >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > >  > > > Â > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero
file to
> > >  > hugepage
> > >  > > > memory
> > >  > > > Â > > Â  Â  Â space, mmap will not respect the requested
address
> > > hint.This
> > >  > will
> > >  > > > Â > > cause
> > >  > > > Â > > Â  Â  Â the memory initialization for the second
> > process fails.
> > > This
> > >  > > > patch adds
> > >  > > > Â > > Â  Â  Â the required mmap flags to make it work.
> > Beside this, users
> > >  > > > need to set
> > >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA
> > range. When
> > >  > > > Â > > Â  Â  Â doing the initialization, users need to set both

> > > nr_hugepages
> > >  > and
> > >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like
64,
> > > 128, etc.
> > >  > > > Â > >
> > >  > > > Â > > mmap address hints are not respected. Looking at the
mmap
> > > code in
> > >  > the
> > >  > > > Â > > kernel this is not true entirely however under some
> > > circumstances
> > >  > > > the hint
> > >  > > > Â > > can be ignored (
> > >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> > >  > > > Â >
> > >  > > >
> > >  > >
> > >  >
> > >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> > >  >
> > >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> > >  > > > Â >
> > >  >
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > >  > > > Â > > ). However I believe we can remove the extra case
> forPPC if we
> > >  > use
> > >  > > > Â > > MAP_FIXED when doing the secondary process
mappingsbecause we
> > >  > need
> > >  > > > them to
> > >  > > > Â > > be identical anyway. We could also use MAP_FIXED
> whendoing the
> > >  > primary
> > >  > > > Â > > process mappings resp. get_virtual_area if we want
> to have any
> > >  > > > guarantees
> > >  > > > Â > > when specifying a base address. Any thoughts?
> > >  > > > Â > >
> > >  > > > Â > > Thanks,
> > >  > > > Â > > Jonas
> > >  > > > Â > >
> > >  > > > Â > hi Jonas,
> > >  > > > Â >
> > >  > > > Â > MAP_FIXED is not used because it's dangerous, it
> unmaps anything
> > >  > that is
> > >  > > > Â > already mapped into that space. We would rather know
> > that we can't
> > >  > map
> > >  > > > Â > something than unwittingly unmap something that was
> > mapped before.
> > >  > > >
> > >  > > > Ok, I see. Maybe we can add a check to the primary process's
memory
> > >  > > > mappings whether the hint has been respected or not? At
> least warn if
> > >  > it
> > >  > > > hasn't.
> > >  > >
> > >  > > Hi Jonas,
> > >  > >
> > >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd
> have to explain
> > >  > > a bit more what you mean by "hint has been respected" :)
> > >  >
> > >  > Hi Anatoly,
> > >  >
> > >  > What I meant was the mmap address hint:
> > >  >
> > >  > "If addr is not NULL, then the kernel takes it as a hint
> > >  > Â about where to place the mapping; on Linux, the mapping will be
> > >  > Â created at a nearby page boundary."
> > >  >
> > >  > This is actually not true on POWER. It can happen that the address

> > > hint is
> > >  > ignored and you get any address back that fits your mapping.
> > >  >
> > >  > Thanks,
> > >  > Jonas
> > >
> > > Actually looking through the kernel code this is also not
> guaranteed on x86.
> > > (https://urldefense.proofpoint.com/v2/url?
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-

> > g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> >
>
pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-

> > jsnNP-hAcW487Mumv6xPw&e=)
> > >
> > > So in any case the address hint can be ignored by the kernel and you
get
> > > any address that fits your mapping.
> > > My suggestion is to check when we do the initial mapping in
> > > get_virtual_area if the hint was respected or not, i.e. if the
returned
> > > address == PAGE_ALIGN(address_hint).
> > >
> >
> > I'm not sure i see the issue here. So, just to make sure i understand
> > things correctly:
> >
> > Whenever we don't request a specific base address through base_address
> > EAL parameter, none of this matters - we always ask for memory in
> > arbitrary memory locations, correct?
> >
> > It's also not an issue with secondary processes because we do check
> > returned mmap address to see whether it's the same as we requested,
correct?
> >
> > It's only whenever we *do* specify a base_address, we provide an
address
> > hint to mmap to, but we don't check if the address we got from mmap is
> > one in the vicinity of our requested base address, correct? We don't
> > check, and the kernel can ignore address hint, so we're not guaranteed
> > to respect the base_address flag.
> >
> > I'm not sure this is a serious issue, because as far as i'm concerned,
> > this flag is advisory - we only promise to *attempt* to map things at
> > that particular address, not that it will succeed. If the kernel simply

> > cannot find an address to satisfy our address hint, or ignores it for
> > other reasons - well, tough, nothing we can do about that. I'm not sure

> > putting a check like this, where we can't even predict an "expected"
> > address is a good idea.
> >
> > Am i getting this right?
>
> The problem is when we specify a base address we want it to be used. If
it is
> not respected we basically end up with the case like we would have
> never specified it.
> This very likely leads to not being able to run a secondary process
because
> we will not be able to map the addresses from our primary process
> and that is why we
> introduced the base address parameter in the first place.
>
> >
> > --
> > Thanks,
> > Anatoly
> >
> The reason why I put the patch there is that when mapping hugepage
> on POWER, the kernel will never respect the address hints when doing
> mmap unless we expand the address space or unmap all the hugepages.
> This is a big difference when compared with x86. And it affects the
> mapping of  the secondary process. I agree that the hints is
> advisory. Just want to see if there are better solutions.


This is not true. I looked through the kernel code and the address
hint is treated almost the same on both platforms:

PPC:
https://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L143
Line 169/170

x86:
https://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165
Line 189/190

The only thing that might differ is the virtual address layout
(e.g. due to different page size etc) and that might lead to the same
value for base-virtaddr not working on both x86 and POWER.
However I tested with different address hints and you easily can
find addresses where the address hint is indeed respected.
That is also why I send in a patch to remove the HUGETLB flags on
the mmap.

Thanks,
Jonas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-11-07 10:15                 ` Jonas Pfefferle1
@ 2017-11-09  3:08                   ` Chao Zhu
  2017-11-09  9:54                     ` Jonas Pfefferle1
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Zhu @ 2017-11-09  3:08 UTC (permalink / raw)
  To: 'Jonas Pfefferle1'
  Cc: 'Burakov, Anatoly', bruce.richardson, dev

 

 

From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] 
Sent: 2017年11月7日 18:16
To: Chao Zhu <chaozhu@linux.vnet.ibm.com>
Cc: 'Burakov, Anatoly' <anatoly.burakov@intel.com>; bruce.richardson@intel.com; dev@dpdk.org
Subject: RE: [dpdk-dev] Huge mapping secondary process linux

 

"Chao Zhu" <chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> > wrote on 11/07/2017 09:25:26 AM:

> From: "Chao Zhu" <chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> >
> To: "'Jonas Pfefferle1'" <JPF@zurich.ibm.com <mailto:JPF@zurich.ibm.com> >, "'Burakov, Anatoly'" 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> Cc: <bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> >, <dev@dpdk.org <mailto:dev@dpdk.org> >
> Date: 11/07/2017 11:00 AM
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
> 
>  
>  
> From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] 
> Sent: 2017年10月28日 3:23
> To: Burakov, Anatoly <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> Cc: bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> ; chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> ; dev@dpdk.org <mailto:dev@dpdk.org> 
> Subject: Re: [dpdk-dev] Huge mapping secondary process linux
>  
> "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> > wrote on 27/10/2017 18:00:27:
> 
> > From: "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> > To: Jonas Pfefferle1 <JPF@zurich.ibm.com <mailto:JPF@zurich.ibm.com> >
> > Cc: bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> , chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> , dev@dpdk.org <mailto:dev@dpdk.org> 
> > Date: 27/10/2017 18:00
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > 
> > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > > "dev" <dev-bounces@dpdk.org <mailto:dev-bounces@dpdk.org> > wrote on 10/27/2017 04:58:01 PM:
> > > 
> > >  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com <mailto:JPF@zurich.ibm.com> >
> > >  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> > >  > Cc: bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> , chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> , 
> dev@dpdk.org <mailto:dev@dpdk.org> 
> > >  > Date: 10/27/2017 04:58 PM
> > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > Sent by: "dev" <dev-bounces@dpdk.org <mailto:dev-bounces@dpdk.org> >
> > >  >
> > >  >
> > >  > "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> > wrote on 10/27/2017 
> > > 04:44:52
> > >  > PM:
> > >  >
> > >  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> > >  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com <mailto:JPF@zurich.ibm.com> >
> > >  > > Cc: bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> , chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> , 
> > > dev@dpdk.org <mailto:dev@dpdk.org> 
> > >  > > Date: 10/27/2017 04:45 PM
> > >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > >
> > >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > >  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> > wrote on 10/27/2017
> > >  > > > 04:06:44 PM:
> > >  > > >
> > >  > > > Â > From: "Burakov, Anatoly" <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com> >
> > >  > > > Â > To: Jonas Pfefferle1 <JPF@zurich.ibm.com <mailto:JPF@zurich.ibm.com> >, dev@dpdk.org <mailto:dev@dpdk.org> 
> > >  > > > Â > Cc: chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> , bruce.richardson@intel.com <mailto:bruce.richardson@intel.com> 
> > >  > > > Â > Date: 10/27/2017 04:06 PM
> > >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >  > > > Â >
> > >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > >  > > > Â > >
> > >  > > > Â > >
> > >  > > > Â > > Hi @all,
> > >  > > > Â > >
> > >  > > > Â > > I'm trying to make sense of the hugepage memory mappings in
> > >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> > >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we try to do a
> > >  > private
> > >  > > > Â > > mapping on /dev/zero (line 1393) why do we not use MAP_FIXED 
> > > if we
> > >  >
> > >  > > > need the
> > >  > > > Â > > addresses to be identical with the primary process?
> > >  > > > Â > > * On POWER we have this weird business going on where we use
> > >  > > > MAP_HUGETLB
> > >  > > > Â > > because according to this commit:
> > >  > > > Â > >
> > >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > >  > > > Â > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com <mailto:chaozhu@linux.vnet.ibm.com> >
> > >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> > >  > > > Â > >
> > >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/zero file to
> > >  > hugepage
> > >  > > > memory
> > >  > > > Â > > Â  Â  Â space, mmap will not respect the requested address 
> > > hint.This
> > >  > will
> > >  > > > Â > > cause
> > >  > > > Â > > Â  Â  Â the memory initialization for the second 
> > process fails. 
> > > This
> > >  > > > patch adds
> > >  > > > Â > > Â  Â  Â the required mmap flags to make it work. 
> > Beside this, users
> > >  > > > need to set
> > >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA 
> > range. When
> > >  > > > Â > > Â  Â  Â doing the initialization, users need to set both 
> > > nr_hugepages
> > >  > and
> > >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same value, like 64, 
> > > 128, etc.
> > >  > > > Â > >
> > >  > > > Â > > mmap address hints are not respected. Looking at the mmap 
> > > code in
> > >  > the
> > >  > > > Â > > kernel this is not true entirely however under some 
> > > circumstances
> > >  > > > the hint
> > >  > > > Â > > can be ignored (
> > >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> > >  > > > Â >
> > >  > > >
> > >  > >
> > >  > 
> > > 
> > 
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
> > >  >
> > >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> > >  > > > Â >
> > >  > 
> BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > >  > > > Â > > ). However I believe we can remove the extra case 
> forPPC if we
> > >  > use
> > >  > > > Â > > MAP_FIXED when doing the secondary process mappingsbecause we
> > >  > need
> > >  > > > them to
> > >  > > > Â > > be identical anyway. We could also use MAP_FIXED 
> whendoing the
> > >  > primary
> > >  > > > Â > > process mappings resp. get_virtual_area if we want 
> to have any
> > >  > > > guarantees
> > >  > > > Â > > when specifying a base address. Any thoughts?
> > >  > > > Â > >
> > >  > > > Â > > Thanks,
> > >  > > > Â > > Jonas
> > >  > > > Â > >
> > >  > > > Â > hi Jonas,
> > >  > > > Â >
> > >  > > > Â > MAP_FIXED is not used because it's dangerous, it 
> unmaps anything
> > >  > that is
> > >  > > > Â > already mapped into that space. We would rather know 
> > that we can't
> > >  > map
> > >  > > > Â > something than unwittingly unmap something that was 
> > mapped before.
> > >  > > >
> > >  > > > Ok, I see. Maybe we can add a check to the primary process's memory
> > >  > > > mappings whether the hint has been respected or not? At 
> least warn if
> > >  > it
> > >  > > > hasn't.
> > >  > >
> > >  > > Hi Jonas,
> > >  > >
> > >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd 
> have to explain
> > >  > > a bit more what you mean by "hint has been respected" :)
> > >  >
> > >  > Hi Anatoly,
> > >  >
> > >  > What I meant was the mmap address hint:
> > >  >
> > >  > "If addr is not NULL, then the kernel takes it as a hint
> > >  > Â about where to place the mapping; on Linux, the mapping will be
> > >  > Â created at a nearby page boundary."
> > >  >
> > >  > This is actually not true on POWER. It can happen that the address 
> > > hint is
> > >  > ignored and you get any address back that fits your mapping.
> > >  >
> > >  > Thanks,
> > >  > Jonas
> > > 
> > > Actually looking through the kernel code this is also not 
> guaranteed on x86.
> > > (https://urldefense.proofpoint.com/v2/url?
> > 
> u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-
> > g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > 
> pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-
> > jsnNP-hAcW487Mumv6xPw&e=)
> > > 
> > > So in any case the address hint can be ignored by the kernel and you get 
> > > any address that fits your mapping.
> > > My suggestion is to check when we do the initial mapping in 
> > > get_virtual_area if the hint was respected or not, i.e. if the returned 
> > > address == PAGE_ALIGN(address_hint).
> > > 
> > 
> > I'm not sure i see the issue here. So, just to make sure i understand 
> > things correctly:
> > 
> > Whenever we don't request a specific base address through base_address 
> > EAL parameter, none of this matters - we always ask for memory in 
> > arbitrary memory locations, correct?
> > 
> > It's also not an issue with secondary processes because we do check 
> > returned mmap address to see whether it's the same as we requested, correct?
> > 
> > It's only whenever we *do* specify a base_address, we provide an address 
> > hint to mmap to, but we don't check if the address we got from mmap is 
> > one in the vicinity of our requested base address, correct? We don't 
> > check, and the kernel can ignore address hint, so we're not guaranteed 
> > to respect the base_address flag.
> > 
> > I'm not sure this is a serious issue, because as far as i'm concerned, 
> > this flag is advisory - we only promise to *attempt* to map things at 
> > that particular address, not that it will succeed. If the kernel simply 
> > cannot find an address to satisfy our address hint, or ignores it for 
> > other reasons - well, tough, nothing we can do about that. I'm not sure 
> > putting a check like this, where we can't even predict an "expected" 
> > address is a good idea.
> > 
> > Am i getting this right?
> 
> The problem is when we specify a base address we want it to be used. If it is
> not respected we basically end up with the case like we would have 
> never specified it.
> This very likely leads to not being able to run a secondary process because
> we will not be able to map the addresses from our primary process 
> and that is why we
> introduced the base address parameter in the first place.
> 
> > 
> > -- 
> > Thanks,
> > Anatoly
> > 
> The reason why I put the patch there is that when mapping hugepage 
> on POWER, the kernel will never respect the address hints when doing
> mmap unless we expand the address space or unmap all the hugepages. 
> This is a big difference when compared with x86. And it affects the 
> mapping of  the secondary process. I agree that the hints is 
> advisory. Just want to see if there are better solutions.


This is not true. I looked through the kernel code and the address
hint is treated almost the same on both platforms: 

PPC:  <https://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L143> https://elixir.free-electrons.com/linux/latest/source/arch/powerpc/mm/mmap.c#L143
Line 169/170

x86:  <https://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165> https://elixir.free-electrons.com/linux/latest/source/arch/x86/kernel/sys_x86_64.c#L165
Line 189/190

The only thing that might differ is the virtual address layout
(e.g. due to different page size etc) and that might lead to the same 
value for base-virtaddr not working on both x86 and POWER.
However I tested with different address hints and you easily can
find addresses where the address hint is indeed respected. 
That is also why I send in a patch to remove the HUGETLB flags on
the mmap.

Thanks,
Jonas

You can take a look at this. https://bugzilla.linux.ibm.com/show_bug.cgi?id=141628

It’s quite interesting.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] Huge mapping secondary process linux
  2017-11-09  3:08                   ` Chao Zhu
@ 2017-11-09  9:54                     ` Jonas Pfefferle1
  0 siblings, 0 replies; 14+ messages in thread
From: Jonas Pfefferle1 @ 2017-11-09  9:54 UTC (permalink / raw)
  To: Chao Zhu; +Cc: 'Burakov, Anatoly', bruce.richardson, dev

"Chao Zhu" <chaozhu@linux.vnet.ibm.com> wrote on 11/09/2017 04:08:36 AM:

> From: "Chao Zhu" <chaozhu@linux.vnet.ibm.com>
> To: "'Jonas Pfefferle1'" <JPF@zurich.ibm.com>
> Cc: "'Burakov, Anatoly'" <anatoly.burakov@intel.com>,
> <bruce.richardson@intel.com>, <dev@dpdk.org>
> Date: 11/09/2017 04:08 AM
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
>
>
> From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com]
> Sent: 2017年11月7日 18:16
> To: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> Cc: 'Burakov, Anatoly' <anatoly.burakov@intel.com>;
> bruce.richardson@intel.com; dev@dpdk.org
> Subject: RE: [dpdk-dev] Huge mapping secondary process linux
>
> "Chao Zhu" <chaozhu@linux.vnet.ibm.com> wrote on 11/07/2017 09:25:26 AM:
>
> > From: "Chao Zhu" <chaozhu@linux.vnet.ibm.com>
> > To: "'Jonas Pfefferle1'" <JPF@zurich.ibm.com>, "'Burakov, Anatoly'"
> > <anatoly.burakov@intel.com>
> > Cc: <bruce.richardson@intel.com>, <dev@dpdk.org>
> > Date: 11/07/2017 11:00 AM
> > Subject: RE: [dpdk-dev] Huge mapping secondary process linux
> >
> >
> >
> > From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com]
> > Sent: 2017年10月28日 3:23
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com;
dev@dpdk.org
> > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> >
> > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on
27/10/201718:00:27:
> >
> > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
dev@dpdk.org
> > > Date: 27/10/2017 18:00
> > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > >
> > > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:
> > > > "dev" <dev-bounces@dpdk.org> wrote on 10/27/2017 04:58:01 PM:
> > > >
> > > >  > From: "Jonas Pfefferle1" <JPF@zurich.ibm.com>
> > > >  > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > > >  > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
> > dev@dpdk.org
> > > >  > Date: 10/27/2017 04:58 PM
> > > >  > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > > >  > Sent by: "dev" <dev-bounces@dpdk.org>
> > > >  >
> > > >  >
> > > >  > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on
10/27/2017
> > > > 04:44:52
> > > >  > PM:
> > > >  >
> > > >  > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > > >  > > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> > > >  > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
> > > > dev@dpdk.org
> > > >  > > Date: 10/27/2017 04:45 PM
> > > >  > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux
> > > >  > >
> > > >  > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote:
> > > >  > > > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote
on10/27/2017
> > > >  > > > 04:06:44 PM:
> > > >  > > >
> > > >  > > > Â > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > > >  > > > Â > To: Jonas Pfefferle1 <JPF@zurich.ibm.com>, dev@dpdk.org
> > > >  > > > Â > Cc: chaozhu@linux.vnet.ibm.com,
bruce.richardson@intel.com
> > > >  > > > Â > Date: 10/27/2017 04:06 PM
> > > >  > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process
linux
> > > >  > > > Â >
> > > >  > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote:
> > > >  > > > Â > >
> > > >  > > > Â > >
> > > >  > > > Â > > Hi @all,
> > > >  > > > Â > >
> > > >  > > > Â > > I'm trying to make sense of the hugepage memory
mappings in
> > > >  > > > Â > > librte_eal/linuxapp/eal/eal_memory.c:
> > > >  > > > Â > > * In rte_eal_hugepage_attach (line 1347) when we
> try to do a
> > > >  > private
> > > >  > > > Â > > mapping on /dev/zero (line 1393) why do we not
> use MAP_FIXED
> > > > if we
> > > >  >
> > > >  > > > need the
> > > >  > > > Â > > addresses to be identical with the primary process?
> > > >  > > > Â > > * On POWER we have this weird business going on
> where we use
> > > >  > > > MAP_HUGETLB
> > > >  > > > Â > > because according to this commit:
> > > >  > > > Â > >
> > > >  > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440
> > > >  > > > Â > > Author: Chao Zhu <chaozhu@linux.vnet.ibm.com>
> > > >  > > > Â > > Date: Â  Thu Apr 6 15:36:09 2017 +0530
> > > >  > > > Â > >
> > > >  > > > Â > > Â  Â  Â eal/ppc: fix mmap for memory initialization
> > > >  > > > Â > >
> > > >  > > > Â > > Â  Â  Â On IBM POWER platform, when mapping /dev/
> zero file to
> > > >  > hugepage
> > > >  > > > memory
> > > >  > > > Â > > Â  Â  Â space, mmap will not respect the requested
address
> > > > hint.This
> > > >  > will
> > > >  > > > Â > > cause
> > > >  > > > Â > > Â  Â  Â the memory initialization for the second
> > > process fails.
> > > > This
> > > >  > > > patch adds
> > > >  > > > Â > > Â  Â  Â the required mmap flags to make it work.
> > > Beside this, users
> > > >  > > > need to set
> > > >  > > > Â > > Â  Â  Â the nr_overcommit_hugepages to expand the VA
> > > range. When
> > > >  > > > Â > > Â  Â  Â doing the initialization, users need to set
both
> > > > nr_hugepages
> > > >  > and
> > > >  > > > Â > > Â  Â  Â nr_overcommit_hugepages to the same
> value, like 64,
> > > > 128, etc.
> > > >  > > > Â > >
> > > >  > > > Â > > mmap address hints are not respected. Looking at the
mmap
> > > > code in
> > > >  > the
> > > >  > > > Â > > kernel this is not true entirely however under some
> > > > circumstances
> > > >  > > > the hint
> > > >  > > > Â > > can be ignored (
> > > >  > > > Â > > https://urldefense.proofpoint.com/v2/url?
> > > >  > > > Â >
> > > >  > > >
> > > >  > >
> > > >  >
> > > >
> > >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-

> > > >  >
> > > >  > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > > >  > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz-
> > > >  > > > Â >
> > > >  >
> >
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
> > > >  > > > Â > > ). However I believe we can remove the extra case
> > forPPC if we
> > > >  > use
> > > >  > > > Â > > MAP_FIXED when doing the secondary process
> mappingsbecause we
> > > >  > need
> > > >  > > > them to
> > > >  > > > Â > > be identical anyway. We could also use MAP_FIXED
> > whendoing the
> > > >  > primary
> > > >  > > > Â > > process mappings resp. get_virtual_area if we want
> > to have any
> > > >  > > > guarantees
> > > >  > > > Â > > when specifying a base address. Any thoughts?
> > > >  > > > Â > >
> > > >  > > > Â > > Thanks,
> > > >  > > > Â > > Jonas
> > > >  > > > Â > >
> > > >  > > > Â > hi Jonas,
> > > >  > > > Â >
> > > >  > > > Â > MAP_FIXED is not used because it's dangerous, it
> > unmaps anything
> > > >  > that is
> > > >  > > > Â > already mapped into that space. We would rather know
> > > that we can't
> > > >  > map
> > > >  > > > Â > something than unwittingly unmap something that was
> > > mapped before.
> > > >  > > >
> > > >  > > > Ok, I see. Maybe we can add a check to the primary
> process's memory
> > > >  > > > mappings whether the hint has been respected or not? At
> > least warn if
> > > >  > it
> > > >  > > > hasn't.
> > > >  > >
> > > >  > > Hi Jonas,
> > > >  > >
> > > >  > > I'm unfamiliar with POWER platform, so i'm afraid you'd
> > have to explain
> > > >  > > a bit more what you mean by "hint has been respected" :)
> > > >  >
> > > >  > Hi Anatoly,
> > > >  >
> > > >  > What I meant was the mmap address hint:
> > > >  >
> > > >  > "If addr is not NULL, then the kernel takes it as a hint
> > > >  > Â about where to place the mapping; on Linux, the mapping will
be
> > > >  > Â created at a nearby page boundary."
> > > >  >
> > > >  > This is actually not true on POWER. It can happen that the
address
> > > > hint is
> > > >  > ignored and you get any address back that fits your mapping.
> > > >  >
> > > >  > Thanks,
> > > >  > Jonas
> > > >
> > > > Actually looking through the kernel code this is also not
> > guaranteed on x86.
> > > > (https://urldefense.proofpoint.com/v2/url?
> > >
> >
>
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-

> > > g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
> > >
> >
>
pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-

> > > jsnNP-hAcW487Mumv6xPw&e=)
> > > >
> > > > So in any case the address hint can be ignored by the kernel
> and you get
> > > > any address that fits your mapping.
> > > > My suggestion is to check when we do the initial mapping in
> > > > get_virtual_area if the hint was respected or not, i.e. if
thereturned
> > > > address == PAGE_ALIGN(address_hint).
> > > >
> > >
> > > I'm not sure i see the issue here. So, just to make sure i understand

> > > things correctly:
> > >
> > > Whenever we don't request a specific base address through
base_address
> > > EAL parameter, none of this matters - we always ask for memory in
> > > arbitrary memory locations, correct?
> > >
> > > It's also not an issue with secondary processes because we do check
> > > returned mmap address to see whether it's the same as we
> requested, correct?
> > >
> > > It's only whenever we *do* specify a base_address, we provide an
address
> > > hint to mmap to, but we don't check if the address we got from mmap
is
> > > one in the vicinity of our requested base address, correct? We don't
> > > check, and the kernel can ignore address hint, so we're not
guaranteed
> > > to respect the base_address flag.
> > >
> > > I'm not sure this is a serious issue, because as far as i'm
concerned,
> > > this flag is advisory - we only promise to *attempt* to map things at

> > > that particular address, not that it will succeed. If the kernel
simply
> > > cannot find an address to satisfy our address hint, or ignores it for

> > > other reasons - well, tough, nothing we can do about that. I'm not
sure
> > > putting a check like this, where we can't even predict an "expected"
> > > address is a good idea.
> > >
> > > Am i getting this right?
> >
> > The problem is when we specify a base address we want it to be
> used. If it is
> > not respected we basically end up with the case like we would have
> > never specified it.
> > This very likely leads to not being able to run a secondary process
because
> > we will not be able to map the addresses from our primary process
> > and that is why we
> > introduced the base address parameter in the first place.
> >
> > >
> > > --
> > > Thanks,
> > > Anatoly
> > >
> > The reason why I put the patch there is that when mapping hugepage
> > on POWER, the kernel will never respect the address hints when doing
> > mmap unless we expand the address space or unmap all the hugepages.
> > This is a big difference when compared with x86. And it affects the
> > mapping of  the secondary process. I agree that the hints is
> > advisory. Just want to see if there are better solutions.
>
>
> This is not true. I looked through the kernel code and the address
> hint is treated almost the same on both platforms:
>
> PPC: https://elixir.free-electrons.com/linux/latest/source/arch/
> powerpc/mm/mmap.c#L143
> Line 169/170
>
> x86: https://elixir.free-electrons.com/linux/latest/source/arch/x86/
> kernel/sys_x86_64.c#L165
> Line 189/190
>
> The only thing that might differ is the virtual address layout
> (e.g. due to different page size etc) and that might lead to the same
> value for base-virtaddr not working on both x86 and POWER.
> However I tested with different address hints and you easily can
> find addresses where the address hint is indeed respected.
> That is also why I send in a patch to remove the HUGETLB flags on
> the mmap.
>
> Thanks,
> Jonas
> You can take a look at this. https://bugzilla.linux.ibm.com/
> show_bug.cgi?id=141628
> It’s quite interesting.

Interesting indeed. I misunderstood the problem I thought the
get_virtual_area mmap adress hint is not respected when
the real problem is the address hint when mapping the
hugepages. Still I hope we can find a better solution.
Aside from that I still believe warning on the address
hint being respected or not is a good idea.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-11-09  9:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27 12:43 [dpdk-dev] Huge mapping secondary process linux Jonas Pfefferle1
2017-10-27 14:06 ` Burakov, Anatoly
2017-10-27 14:28   ` Jonas Pfefferle1
2017-10-27 14:44     ` Burakov, Anatoly
2017-10-27 14:58       ` Jonas Pfefferle1
2017-10-27 15:16         ` Jonas Pfefferle1
2017-10-27 16:00           ` Burakov, Anatoly
2017-10-27 19:22             ` Jonas Pfefferle1
2017-11-07  8:25               ` Chao Zhu
2017-11-07 10:15                 ` Jonas Pfefferle1
2017-11-09  3:08                   ` Chao Zhu
2017-11-09  9:54                     ` Jonas Pfefferle1
2017-10-27 15:48       ` Tan, Jianfeng
2017-10-27 16:06         ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).